Performance Improvement of Compression Algorithms for Gene Sequencing Reads by Cache Miss Improvement

Shadab, Mohammad; Goudarzi, Maziar

Please enable javascript in your browser.

Performance Improvement of Compression Algorithms for Gene Sequencing Reads by Cache Miss Improvement

Shadab, Mohammad | 2022

77 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 55717 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Goudarzi, Maziar
Abstract:
Nowadays, one of the challenges in the field of bioinformatics is the excess processed data volume such that this data volume resulted from a complete genome sequence of a species can be up to hundreds gigabytes. Every time that we talk about increasing data volume, data storage, transforming, and the process will become of interest. Moreover, considering the presence of portable sequencer devices in the market and the limitations of process outside of the lab environments, this problem becomes of more critical importance. Fortunately, due to the nature of the genome data and their redundancy, specific algorithms to compress them have been introduced to the market. In this thesis, we chose three of the best-performing algorithms for compression named: HARC, Spring, and Enano. Then we analyzed these algorithms at the architectural level. By performing different simulations, the maximum efficiency in the acceleration of these algorithms was analyzed and the common methods of increasing the efficiency of the cache-miss were applied for each of these algorithms. Based on our analysis by Perf, these three algorithms didn't respond to the methods in the architectural level and the rate of a cache-misses in the first, second and third levels were more than 10, 50, and 50 percent respectively. It should be mentioned that in the Spring algorithm most of the parts that caused cache misses have indirect access to memory which makes troubles for hardware prefetcher unit. Our analysis showed that the cache misses in the LLC has an important effect on the program execution time and by focusing on removing misses in the LLC in HARC, Spring and Enano algorithms we can improve acceleration by 23, 19, and 45 respectively. Finally, with the use of common methods like software prefetching we tried to show improvement of the performance of cache. We demonstrated that due to the nature of the loops in our benchmarks, the current prefetching methods of by compilers don't have efficacy. In fact they cause slowing the program. Inserting Software-prefetch instruction manually, can reduce cache miss by 13 percent in the last layer and decrease cycles by 6%
Keywords:
Compression ; Cache Memory ; Genome Sequencing ; Gene Reads ; Software Prefetcher ; Data Prefetching

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code