Abstract:
Given a data set D containing n data points in high-dimensional Euclidean space, it is often helpful if it can be projected into a lower dimensional Euclidean space without suffering great distortion. This operation is known as dimensionality reduction. In this presentation, we shall discuss various reasons why we may want to reduce the dimensionality of a dataset. We shall discuss the two categories of dimensionality reduction techniques. The first category includes those in which each attribute in the reduced set is a linear combination of the attributes in the original dataset such as RP, PCA, SVD and KPCA. The second category includes those in which the set of attributes in the reduced set is a proper subset of the attributes in the original dataset such as Variance, NRA, Comb App, Dir App and others. Finally, we shall discuss how dimensionality reduction can be applied in the reduction of text and image data, and in the domains of clustering and classification, which are machine learning techniques.