A kmer-based parallel algorithm for pattern searching in DNA sequences on shared-memory model

Kaniwa, Freeson

dc.contributor.supervisor	Kuthadi, Venu Madhav
dc.contributor.supervisor	Dinakenyane, Otlhapile
dc.contributor.supervisor	Schroeder, Heiko
dc.contributor.author	Kaniwa, Freeson
dc.date.accessioned	2019-03-26T06:14:45Z
dc.date.available	2019-03-26T06:14:45Z
dc.date.issued	2018-08
dc.identifier.citation	Kaniwa,Freeson (2018) A kmer-based parallel algorithm for pattern searching in DNA sequences on shared-memory model,Masters Thesis, Botswana International University of Science and Technology: Palapye	en_US
dc.identifier.uri	https://repository.biust.ac.bw/handle/123456789/85
dc.description	Theses (PhD Computer Science)-----Botswana International University Science and Technology, 2018	en_US
dc.description.abstract	The explosive growth of biological sequences over the last decade has consequently led to a lot of research on effective techniques for storing and analysing this big data. Generating Deoxyribonucleic Acid (DNA) sequences has become more reliable, faster, easier and cheaper due to latest sequencing technology leading to big data sequences. This poses a challenge to efficient analysis to such massive sequences. Advanced indexing algorithmic techniques are required to facilitate efficient analysis. A particular interesting indexing data structure for such analysis is the Suffix Tree. Search times for patterns have linear times, however longer construction times and high memory requirements make it prohibitive to use this data structure for very long sequences such as DNA sequences. To index these long sequences, several algorithms have been explored in recent years. These algorithms all target to improve running time and memory consumption for the Suffix Tree. That is, being able to index a long sequence by making it fit in the main memory and being able to look for patterns in reasonable times. DNA sequences carry basic information for each human being. Searching for patterns in DNA sequences has been shown to have biological relevance since patterns signify some underlying biological function such as disease detection, genetic variation, and regulation, etc. This thesis explores the use of truncating the generalised Suffix Tree basing on the details of the input, that is, the size of the repeat to be searched. This is aimed at improving the memory requirements. In this thesis, parallelisation of the Suffix Tree data structure is further explored to reduce the construction times. Parallelisation has become of more interest since parallel architectures are now ubiquitous in standard desktop computers. Based on the theoretical and experimental findings, we can conclude that specification of pattern size, prior to Suffix Tree construction by truncating, reduces the memory requirements of the generalised Suffix Tree. In conclusion, the novel alphabet dependant parallelisation algorithm achieves a speedup of 15x over the existing state of art sequential algorithm. Despite the success of our Parallel algorithm for Suffix Tree (PaST) algorithm, its main limitation is that of specification of pattern size prior to construction or searching. Other than that, comparatively our algorithm shows improved performance in terms of memory and time requirements.	en_US
dc.language.iso	en	en_US
dc.publisher	Botswana International University of Science and Technology	en_US
dc.subject	Big data	en_US
dc.subject	Generating Deoxyribonucleic Acid	en_US
dc.subject	DNA	en_US
dc.subject	Indexing algorithmic techniques	en_US
dc.subject	Suffix Tree	en_US
dc.subject	Algorithm	en_US
dc.title	A kmer-based parallel algorithm for pattern searching in DNA sequences on shared-memory model	en_US
dc.description.level	phd	en_US
dc.description.accessibility	unrestricted	en_US
dc.description.department	cis	en_US

Files in this item

Name: KANIWA Freeson Phd.pdf

Size: 4.147Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

Faculty of Sciences
This collection is made up of electronic theses and dissertations produced by post graduate students from Faculty of Sciences

A kmer-based parallel algorithm for pattern searching in DNA sequences on shared-memory model

Files in this item

This item appears in the following Collection(s)

Search BIUSTRE

Browse

All of BIUSTRE

This Collection

My Account