BIUSTRE

Outlier management based on support vector machine

Show simple item record

dc.contributor.supervisor Maupong, Thabiso
dc.contributor.supervisor Ndwapi, Nkumbuludzi
dc.contributor.author Thutlwe, Kelebogile
dc.date.accessioned 2025-08-25T10:18:30Z
dc.date.available 2025-08-25T10:18:30Z
dc.date.issued 2023-05-30
dc.identifier.citation Thutlwe, K. (2023) Outlier management based on support vector machine, Masters Theses, Botswana International University of Science and Technology: Palapye en_US
dc.identifier.uri https://repository.biust.ac.bw/handle/123456789/642
dc.description Thesis (MSc Computer Science)--Botswana International University of Science and Technology, 2023 en_US
dc.description.abstract Outliers describe data points that deviate from the rest of a dataset. These deviations may be caused by the following but not limited to; errors during data collection, instrumentation errors and faults in data entry. These faults and errors make it difficult to capture and accurately model the patterns that are inherent in the data. On the other hand, outliers may be an indication of new discoveries in a field, calling for more research on why or how these outlying data points came about. Considering the potential benefits of outliers as well as the problems they may pose, this work focuses on the management of outliers in multivariate datasets with the primary focus in classification problems. In this work, an algorithm to relabel outliers is developed. Support Vector Machine (SVM) is used to detect outliers given its effectiveness in dealing with high dimensional data. After detection, the outliers are relabelled using the developed technique. This technique, measures how far off the outliers are in relation to respective class centroids. Then, based on the comparison between the measured distances, the outliers are relabelled. To evaluate the effectiveness of the developed technique, we use discrimination techniques such as Fisher’s Linear Discriminant Analysis (FLDA), Gaussian Mixture Model (GMM) and the unsupervised clustering technique to highlight the extent of discrimination. The results show a significant improvement and in some cases obtain a 100% classification on evaluation using the Adjusted Rand Index (ARI) metric. en_US
dc.description.sponsorship Botswana International University of Science and Technology (BIUST) en_US
dc.language.iso en en_US
dc.publisher Botswana International University of Science and Technology (BIUST) en_US
dc.subject Unsupervised clustering en_US
dc.subject Gaussian Mixture Model (GMM) en_US
dc.subject Fisher’s Linear Discriminant Analysis (FLDA) en_US
dc.subject Discrimination techniques en_US
dc.subject centroids en_US
dc.subject Relabeling technique Class en_US
dc.subject Support Vector Machine (SVM) en_US
dc.subject Classification problems en_US
dc.subject Multivariate datasets en_US
dc.subject Instrumentation errors en_US
dc.subject Data points en_US
dc.title Outlier management based on support vector machine en_US
dc.description.level msc en_US
dc.description.accessibility unrestricted en_US
dc.description.department cis en_US


Files in this item

This item appears in the following Collection(s)

  • Faculty of Sciences
    This collection is made up of electronic theses and dissertations produced by post graduate students from Faculty of Sciences

Show simple item record

Search BIUSTRE


Browse

My Account