Publications
2023
2022
- ISCAEDAM: Edit Distance Tolerant Approximate Matching Content Addressable MemoryRobert Hanhan, Esteban Garzón, Zuher Jahshan, Adam Teman, Marco Lanuzza, and Leonid YavitsIn Proceedings of the 49th Annual International Symposium on Computer Architecture, 2022
We propose a novel edit distance-tolerant content addressable memory (EDAM) for energy-efficient approximate search applications. Unlike state-of-the-art approximate search solutions that tolerate certain Hamming distance between the query pattern and the stored data, EDAM tolerates edit distance, which makes it especially efficient in applications such as text processing and genome analysis. EDAM was designed using a commercial 65 nm 1.2 V CMOS technology and evaluated through extensive Monte Carlo simulations, while considering different process corners. Simulation results show that EDAM can achieve robust approximate search operation with a wide range of edit distance threshold levels. EDAM is functionally evaluated as a pathogen DNA detection and classification accelerator. EDAM achieves up to 1.7x higher F1 score for high-quality DNA reads and up to 19.55x higher F1 score for DNA reads with 15% error rate, compared to state-of-the-art DNA classification tool Kraken2. Simulated at 667 MHz, EDAM provides 1, 214x average speedup over Kraken2. This makes EDAM suitable for hardware acceleration of genomic surveillance of outbreaks, such as the ongoing Covid-19 pandemic.
@inproceedings{hanhan_edam_2022, address = {New York, NY, USA}, series = {{ISCA} '22}, title = {{EDAM}: {Edit} {Distance} {Tolerant} {Approximate} {Matching} {Content} {Addressable} {Memory}}, isbn = {978-1-4503-8610-4}, url = {https://doi.org/10.1145/3470496.3527424}, doi = {10.1145/3470496.3527424}, booktitle = {Proceedings of the 49th {Annual} {International} {Symposium} on {Computer} {Architecture}}, publisher = {Association for Computing Machinery}, author = {Hanhan, Robert and Garzón, Esteban and Jahshan, Zuher and Teman, Adam and Lanuzza, Marco and Yavits, Leonid}, year = {2022}, pages = {495--507}, }
2021
- VLSI Tech.HERMES Core – A 14nm CMOS and PCM-based In-Memory Compute Core using an array of 300ps/LSB Linearized CCO-based ADCs and local digital processingR. Khaddam-Aljameh, M. Stanisavljevic, J. Fornt Mas, G. Karunaratne, M. Braendli, F. Liu, A. Singh, S. M. Müller, U. Egger, A. Petropoulos, T. Antonakopoulos, K. Brew, S. Choi, I. Ok, F. L. Lie, N. Saulnier, V. Chan, I. Ahsan, V. Narayanan, S. R. Nandakumar, M. Le Gallo, P. A. Francese, A. Sebastian, and E. EleftheriouIn 2021 Symposium on VLSI Technology, 2021
We present a 256x256 in-memory compute (IMC) core designed and fabricated in 14nm CMOS with backend-integrated multi-level phase-change memory (PCM). It comprises 256 linearized current controlled oscillator (CCO)-based ADCs at a compact 4µm pitch and a local digital processing unit performing affine scaling and ReLU operations. A novel frequency-linearization technique for CCOs is introduced, leading to accurate on-chip matrix-vector-multiply (MVM) when operating over 1 GHz. Measured classification accuracies on MNIST and CIFAR-10 datasets are presented when two cores are employed for deep learning (DL) inference. The measured energy efficiency is 10.5 TOPS/W at a performance density of 1.59 TOPS/mm 2 .
@inproceedings{khaddam-aljameh_hermes_2021, title = {{HERMES} {Core} – {A} 14nm {CMOS} and {PCM}-based {In}-{Memory} {Compute} {Core} using an array of 300ps/{LSB} {Linearized} {CCO}-based {ADCs} and local digital processing}, booktitle = {2021 {Symposium} on {VLSI} {Technology}}, author = {Khaddam-Aljameh, R. and Stanisavljevic, M. and Mas, J. Fornt and Karunaratne, G. and Braendli, M. and Liu, F. and Singh, A. and Müller, S. M. and Egger, U. and Petropoulos, A. and Antonakopoulos, T. and Brew, K. and Choi, S. and Ok, I. and Lie, F. L. and Saulnier, N. and Chan, V. and Ahsan, I. and Narayanan, V. and Nandakumar, S. R. and Gallo, M. Le and Francese, P. A. and Sebastian, A. and Eleftheriou, E.}, year = {2021}, pages = {1--2}, }
2020
- BIBMVariant Calling Parallelization on Processor-in-Memory ArchitectureD. Lavenier, R. Cimadomo, and R. JodinIn 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Dec 2020
This paper introduces a new combination of software and hardware PIM (Process-in-Memory) architecture to accelerate the variant calling genomic process. PIM translates into bringing data intensive calculations directly where the data is: within the DRAM, enhanced with thousands of processing units. The energy consumption, in large part due to data movement, is significantly lowered at a marginal additional hardware cost. Such design allows an unprecedented level of parallelism to process billions of short reads. Experiments on real PIM devices developed by the UPMEM company show significant speed-up compared to pure software implementation. The PIM solution also compared nicely to FPGA or GPU based acceleration bringing similar to twice the processing speed but most importantly being 5 to 8 times cheaper to deploy with up to 6 times less power consumption.
@inproceedings{lavenier_variant_2020, address = {Los Alamitos, CA, USA}, title = {Variant {Calling} {Parallelization} on {Processor}-in-{Memory} {Architecture}}, url = {https://doi.ieeecomputersociety.org/10.1109/BIBM49941.2020.9313351}, doi = {10.1109/BIBM49941.2020.9313351}, booktitle = {2020 {IEEE} {International} {Conference} on {Bioinformatics} and {Biomedicine} ({BIBM})}, publisher = {IEEE Computer Society}, author = {Lavenier, D. and Cimadomo, R. and Jodin, R.}, month = dec, year = {2020}, pages = {204--207}, }
- MICROGenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence AnalysisDamla Senol Cali, Gurpreet S. Kalsi, Zülal Bingöl, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Norion, Allison Scibisz, Sreenivas Subramoneyon, Can Alkan, Saugata Ghose, and Onur MutluIn 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Dec 2020
Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, and the understanding of evolution. To perform genome sequencing, devices extract small random fragments of an organism’s DNA sequence (known as reads). The first step of genome sequence analysis is a computational process known as read mapping. In read mapping, each fragment is matched to its potential location in the reference genome with the goal of identifying the original location of each read in the genome. Unfortunately, rapid genome sequencing is currently bottlenecked by the computational power and memory bandwidth limitations of existing systems, as many of the steps in genome sequence analysis must process a large amount of data. A major contributor to this bottleneck is approximate string matching (ASM), which is used at multiple points during the mapping process. ASM enables read mapping to account for sequencing errors and genetic variations in the reads. We propose GenASM, the first ASM acceleration framework for genome sequence analysis. GenASM performs bitvectorbased ASM, which can efficiently accelerate multiple steps of genome sequence analysis. We modify the underlying ASM algorithm (Bitap) to significantly increase its parallelism and reduce its memory footprint. Using this modified algorithm, we design the first hardware accelerator for Bitap. Our hardware accelerator consists of specialized systolic-array-based compute units and on-chip SRAMs that are designed to match the rate of computation with memory capacity and bandwidth, resulting in an efficient design whose performance scales linearly as we increase the number of compute units working in parallel. We demonstrate that GenASM provides significant performance and power benefits for three different use cases in genome sequence analysis. First, GenASM accelerates read alignment for both long reads and short reads. For long reads, GenASM outperforms state-of-the-art software and hardware accelerators by 116x and 3.9x, respectively, while reducing power consumption by 37x and 2.7x. For short reads, GenASM outperforms state-of-the-art software and hardware accelerators by 111x and 1.9x. Second, GenASM accelerates pre-alignment filtering for short reads, with 3.7x the performance of a state-of-the-art pre-alignment filter, while reducing power consumption by 1.7x and significantly improving the filtering accuracy. Third, GenASM accelerates edit distance calculation, with 22-12501x and 9.3-400x speedups over the state-of-the-art software library and FPGA-based accelerator, respectively, while reducing power consumption by 548-582x and 67x. We conclude that GenASM is a flexible, high-performance, and low-power framework, and we briefly discuss four other use cases that can benefit from GenASM.
@inproceedings{cali_genasm_2020, title = {{GenASM}: {A} {High}-{Performance}, {Low}-{Power} {Approximate} {String} {Matching} {Acceleration} {Framework} for {Genome} {Sequence} {Analysis}}, doi = {10.1109/MICRO50266.2020.00081}, booktitle = {2020 53rd {Annual} {IEEE}/{ACM} {International} {Symposium} on {Microarchitecture} ({MICRO})}, author = {Cali, Damla Senol and Kalsi, Gurpreet S. and Bingöl, Zülal and Firtina, Can and Subramanian, Lavanya and Kim, Jeremie S. and Ausavarungnirun, Rachata and Alser, Mohammed and Gomez-Luna, Juan and Boroumand, Amirali and Norion, Anant and Scibisz, Allison and Subramoneyon, Sreenivas and Alkan, Can and Ghose, Saugata and Mutlu, Onur}, year = {2020}, pages = {951--966}, }
- SYSTORBioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic DataRoman Kaplan, Leonid Yavits, and Ran GinosasrIn Proceedings of the 13th ACM International Systems and Storage Conference, Dec 2020
Genome sequences contain hundreds of millions of DNA base pairs. Finding the degree of similarity between two genomes requires executing a compute-intensive dynamic programming algorithm, such as Smith-Waterman. Traditional von Neumann architectures have limited parallelism and cannot provide an efficient solution for large-scale genomic data. Approximate heuristic methods (e.g. BLAST) are commonly used. However, they are suboptimal and still compute-intensive.In this work, we present BioSEAL, a biological sequence alignment accelerator. BioSEAL is a massively parallel non-von Neumann processing-in-memory architecture for large-scale DNA and protein sequence alignment. BioSEAL is based on resistive content addressable memory, capable of energy-efficient and highperformance associative processing.We present an associative processing algorithm for entire database sequence alignment on BioSEAL and compare its performance and power consumption with state-of-art solutions. We show that BioSEAL can achieve up to 57x speedup and 156x better energy efficiency, compared with existing solutions for genome sequence alignment and protein sequence database search.
@inproceedings{kaplan_bioseal_2020, address = {New York, NY, USA}, series = {{SYSTOR} '20}, title = {{BioSEAL}: {In}-{Memory} {Biological} {Sequence} {Alignment} {Accelerator} for {Large}-{Scale} {Genomic} {Data}}, isbn = {978-1-4503-7588-7}, url = {https://doi.org/10.1145/3383669.3398279}, doi = {10.1145/3383669.3398279}, booktitle = {Proceedings of the 13th {ACM} {International} {Systems} and {Storage} {Conference}}, publisher = {Association for Computing Machinery}, author = {Kaplan, Roman and Yavits, Leonid and Ginosasr, Ran}, year = {2020}, pages = {36--48}, }
2019
- IEEE MicroRASSA: Resistive Prealignment Accelerator for Approximate DNA Long Read MappingRoman Kaplan, Leonid Yavits, and Ran GinosarIEEE Micro, Jul 2019
DNA read mapping is a computationally expensive bioinformatics task, required for genome assembly and consensus polishing. It requires to find the best-fitting location for each DNA read on a long reference sequence. A novel resistive approximate similarity search accelerator (RASSA) exploits charge distribution and parallel in-memory processing to reflect a mismatch count between DNA sequences. RASSA implementation of DNA long-read prealignment outperforms the state-of-the-art solution, minimap2, by 16–77x with comparable accuracy and provides two orders of magnitude higher throughput than GateKeeper, a short-read prealignment hardware architecture implemented in FPGA.
@article{kaplan_rassa_2019, title = {{RASSA}: {Resistive} {Prealignment} {Accelerator} for {Approximate} {DNA} {Long} {Read} {Mapping}}, volume = {39}, issn = {0272-1732}, url = {https://doi.org/10.1109/MM.2018.2890253}, doi = {10.1109/MM.2018.2890253}, number = {4}, journal = {IEEE Micro}, author = {Kaplan, Roman and Yavits, Leonid and Ginosar, Ran}, month = jul, year = {2019}, pages = {44--54}, }
2018
- BMC GenomicsGRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologiesJeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, and Onur MutluBMC Genomics, May 2018
Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i.e., sequence alignment) to determine the origin of the read. A seed location filter comes into play before alignment, discarding seed locations that alignment would deem a poor match. The ideal seed location filter would discard all poor match locations prior to alignment such that there is no wasted computation on unnecessary alignments.
@article{kim_grim-filter_2018, title = {{GRIM}-{Filter}: {Fast} seed location filtering in {DNA} read mapping using processing-in-memory technologies}, volume = {19}, issn = {1471-2164}, url = {https://doi.org/10.1186/s12864-018-4460-0}, doi = {10.1186/s12864-018-4460-0}, number = {2}, journal = {BMC Genomics}, author = {Kim, Jeremie S. and Senol Cali, Damla and Xin, Hongyi and Lee, Donghyuk and Ghose, Saugata and Alser, Mohammed and Hassan, Hasan and Ergin, Oguz and Alkan, Can and Mutlu, Onur}, month = may, year = {2018}, pages = {89}, }
2017
2016
- BIBMDNA mapping using Processor-in-Memory architectureDominique Lavenier, Jean-Francois Roy, and David FurodetIn 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), May 2016
This paper presents the implementation of a mapping algorithm on a new Processing-in-Memory (PIM) architecture developed by UPMEM Company. UPMEM’s solution consists in adding processing units into the DRAM, to minimize data access time and maximize bandwidth, in order to drastically accelerate data-consuming algorithms. The technology developed by UPMEM makes it possible to combine 256 cores with 16 GBytes of DRAM, on a standard DIMM module. An experimentation of DNA Mapping on Human genome dataset shows that a speed-up of 25 can be obtained with UPMEM technology compared to fast mapping software such as BWA, Bowtie2 or NextGenMap running on 16 Intel threads. Experimentation also highlight that data transfer from storage device limits the performances of the implementation. The use of SSD drives can boost the speed-up to 80.
@inproceedings{lavenier_dna_2016, title = {{DNA} mapping using {Processor}-in-{Memory} architecture}, doi = {10.1109/BIBM.2016.7822732}, booktitle = {2016 {IEEE} {International} {Conference} on {Bioinformatics} and {Biomedicine} ({BIBM})}, author = {Lavenier, Dominique and Roy, Jean-Francois and Furodet, David}, year = {2016}, pages = {1429--1435}, }