BIUSTRE

A kmer-based parallel algorithm for pattern searching in DNA sequences on shared-memory model

Show simple item record

dc.contributor.supervisor Kuthadi, Venu Madhav
dc.contributor.supervisor Dinakenyane, Otlhapile
dc.contributor.supervisor Schroeder, Heiko
dc.contributor.author Kaniwa, Freeson
dc.date.accessioned 2019-03-26T06:14:45Z
dc.date.available 2019-03-26T06:14:45Z
dc.date.issued 2018-08
dc.identifier.citation Kaniwa,Freeson (2018) A kmer-based parallel algorithm for pattern searching in DNA sequences on shared-memory model,Masters Thesis, Botswana International University of Science and Technology: Palapye en_US
dc.identifier.uri https://repository.biust.ac.bw/handle/123456789/85
dc.description Theses (PhD Computer Science)-----Botswana International University Science and Technology, 2018 en_US
dc.description.abstract The explosive growth of biological sequences over the last decade has consequently led to a lot of research on effective techniques for storing and analysing this big data. Generating Deoxyribonucleic Acid (DNA) sequences has become more reliable, faster, easier and cheaper due to latest sequencing technology leading to big data sequences. This poses a challenge to efficient analysis to such massive sequences. Advanced indexing algorithmic techniques are required to facilitate efficient analysis. A particular interesting indexing data structure for such analysis is the Suffix Tree. Search times for patterns have linear times, however longer construction times and high memory requirements make it prohibitive to use this data structure for very long sequences such as DNA sequences. To index these long sequences, several algorithms have been explored in recent years. These algorithms all target to improve running time and memory consumption for the Suffix Tree. That is, being able to index a long sequence by making it fit in the main memory and being able to look for patterns in reasonable times. DNA sequences carry basic information for each human being. Searching for patterns in DNA sequences has been shown to have biological relevance since patterns signify some underlying biological function such as disease detection, genetic variation, and regulation, etc. This thesis explores the use of truncating the generalised Suffix Tree basing on the details of the input, that is, the size of the repeat to be searched. This is aimed at improving the memory requirements. In this thesis, parallelisation of the Suffix Tree data structure is further explored to reduce the construction times. Parallelisation has become of more interest since parallel architectures are now ubiquitous in standard desktop computers. Based on the theoretical and experimental findings, we can conclude that specification of pattern size, prior to Suffix Tree construction by truncating, reduces the memory requirements of the generalised Suffix Tree. In conclusion, the novel alphabet dependant parallelisation algorithm achieves a speedup of 15x over the existing state of art sequential algorithm. Despite the success of our Parallel algorithm for Suffix Tree (PaST) algorithm, its main limitation is that of specification of pattern size prior to construction or searching. Other than that, comparatively our algorithm shows improved performance in terms of memory and time requirements. en_US
dc.language.iso en en_US
dc.publisher Botswana International University of Science and Technology en_US
dc.subject Big data en_US
dc.subject Generating Deoxyribonucleic Acid en_US
dc.subject DNA en_US
dc.subject Indexing algorithmic techniques en_US
dc.subject Suffix Tree en_US
dc.subject Algorithm en_US
dc.title A kmer-based parallel algorithm for pattern searching in DNA sequences on shared-memory model en_US
dc.description.level phd en_US
dc.description.accessibility unrestricted en_US
dc.description.department cis en_US


Files in this item

This item appears in the following Collection(s)

  • Faculty of Sciences
    This collection is made up of electronic theses and dissertations produced by post graduate students from Faculty of Sciences

Show simple item record

Search BIUSTRE


Browse

My Account