Outlier management based on support vector machine

Thutlwe, Kelebogile

Outlier management based on support vector machine

Thutlwe, Kelebogile

URI: https://repository.biust.ac.bw/handle/123456789/642

Date: 2023-05-30

Abstract:

Outliers describe data points that deviate from the rest of a dataset. These deviations may be caused by the following but not limited to; errors during data collection, instrumentation errors and faults in data entry. These faults and errors make it difficult to capture and accurately model the patterns that are inherent in the data. On the other hand, outliers may be an indication of new discoveries in a field, calling for more research on why or how these outlying data points came about. Considering the potential benefits of outliers as well as the problems they may pose, this work focuses on the management of outliers in multivariate datasets with the primary focus in classification problems. In this work, an algorithm to relabel outliers is developed. Support Vector Machine (SVM) is used to detect outliers given its effectiveness in dealing with high dimensional data. After detection, the outliers are relabelled using the developed technique. This technique, measures how far off the outliers are in relation to respective class centroids. Then, based on the comparison between the measured distances, the outliers are relabelled. To evaluate the effectiveness of the developed technique, we use discrimination techniques such as Fisher’s Linear Discriminant Analysis (FLDA), Gaussian Mixture Model (GMM) and the unsupervised clustering technique to highlight the extent of discrimination. The results show a significant improvement and in some cases obtain a 100% classification on evaluation using the Adjusted Rand Index (ARI) metric.

Description:

Thesis (MSc Computer Science)--Botswana International University of Science and Technology, 2023

Show full item record

Files in this item

Name: Thutlwe- MSc_ ...

Size: 1.427Mb

Format: PDF

View/Open

This item appears in the following Collection(s)

Faculty of Sciences
This collection is made up of electronic theses and dissertations produced by post graduate students from Faculty of Sciences

Search BIUSTRE

Browse

All of BIUSTRE
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

Outlier management based on support vector machine

Outlier management based on support vector machine

Abstract:

Description:

Files in this item

This item appears in the following Collection(s)

Search BIUSTRE

Browse

All of BIUSTRE

This Collection

My Account