BIUSTRE

A context-aware lemmatization model for setswana language using machine learning

Show simple item record

dc.contributor.supervisor Dimane, Mpoeleng
dc.contributor.supervisor Nedev, Zhivko
dc.contributor.supervisor Letsholo, Keletso
dc.contributor.author Bafitlhile, Kgosiyame Ditiro
dc.date.accessioned 2023-02-07T09:11:04Z
dc.date.available 2023-02-07T09:11:04Z
dc.date.issued 2022-08-25
dc.identifier.citation Bafitlhile,K.D (2022) A context-aware lemmatization model for setswana language using machine learning, Master's Thesis, Botswana International University of Science and Technology: Palapye. en_US
dc.identifier.uri http://repository.biust.ac.bw/handle/123456789/536
dc.description Thesis (MSc of Science in Computer Science and Information Systems)---Botswana International University of Science and Technology, 2022 en_US
dc.description.abstract Lemmatization is an important task which is concerned with making computers understand the relationship that exists amongst words written in natural language. It is a prior condition needed for the development of natural language processing (NLP) systems such as machine translation and information retrieval. In particular, Lemmatization is intended to reduce the variability in word forms by collapsing related words to a standard lemma. There is a limited research on lemmatization of Setswana language. A large part of the available research on Setswana lemmatization relies on rule driven strategy, which takes time to construct, lacks context of how words are used, and needs extremely qualified language skills. Moreover, it has been discovered that the treatment of language with hand coded regulations lacks generalization component as it requires a continual redesign every time new data appears and this complicates the scalability of systems. With such rich vocabulary and complex morphology, lemmatization of Setswana cannot be easily unraveled using explicit rules developed by programmers. In this thesis we describe how a supervised machine learning approach that employs the use of Naive Bayes algorithm can solve Setswana lemmatization with regard to how words are used in sentences. The contribution of this study includes; first, context aware lemmatization model, that handles most of the morphologically productive classes. Second, we experiment with the strongest multi-class algorithm Naive Bayes, which to our best knowledge has never been used to address lemmatization in Setswana. The accuracy of the lemmatization model obtained from the experiments reached 70.32%. The model shifts from entirely hand programmed rules and is able to lemmatize words based on the context how they are used. In Setswana lemmatization should be done according to sentence intension, the model again ensures that as long as the data is a good example of the goal concept the generalization is simultaneously created, which allows the model'’s future performance to continue improving. Furthermore, given that this is a young area of research with no standard datasets for training and testing, we also contribute with a considerable medium sized dataset which remains a coveted resource for research community. The experimental results obtained from this study shows that machine learning approaches are more reliable than rule based approaches in lemmatizing Setswana inflectional words with regard to the context of how they are used. en_US
dc.description.sponsorship Botswana International University of Science and Technology (BIUST) en_US
dc.language.iso en en_US
dc.publisher Botswana International University of Science and Technology (BIUST) en_US
dc.subject Setswana Language en_US
dc.subject Lemmatization en_US
dc.subject Natural Language Processing Naive Bayes en_US
dc.subject Machine Learning en_US
dc.subject Data Structures en_US
dc.subject Algorithms en_US
dc.title A context-aware lemmatization model for setswana language using machine learning en_US
dc.description.level msc en_US
dc.description.accessibility unrestricted en_US
dc.description.department cis en_US


Files in this item

This item appears in the following Collection(s)

  • Faculty of Sciences
    This collection is made up of electronic theses and dissertations produced by post graduate students from Faculty of Sciences

Show simple item record

Search BIUSTRE


Browse

My Account