Abstract:
Dimensionality reduction provides a compact representation of an original high-dimensional data, which means the reduced data is free from any further processing and only the vital information is retained. For this reason, it is an invaluable preprocessing step before the application of many machine learning algorithms that perform poorly on high-dimensional data. In this thesis, the perceptron classification algorithm – an eager learner - is applied to three two-class datasets (Student, Weather and Ionosphere datasets). The k-Nearest Neighbors classification algorithm - a
lazy learner - is also applied to the same two-class datasets. Each dataset is then reduced using fifteen different dimensionality reduction techniques. The perceptron and k-nearest neighbor classification algorithms are applied to each reduced set and the performance (evaluated using confusion matrix) of the dimensionality reduction techniques is compared on preserving the classification of a dataset by the k-nearest neighbors and perceptron classification algorithms. This investigation revealed that the dimensionality reduction techniques implemented in this thesis seem to perform much better at preserving K-Nearest Neighbor classification than they do at preserving the classification of the original datasets using the perceptron. In general, the dimensionality reduction techniques prove to be very efficient in preserving the classification of both the lazy and eager learners used for this investigation.