Abstract:
In this project, we shall implement the hierarchical clustering algorithm and apply it to various data sets such as the weather data set, the student data set, and the patient data set. We shall then reduce these datasets using the following dimensionality reduction approaches: Random Projections (RP), Principal Component Analysis (PCA), Variance (Var), the New Random Approach (NRA), the Combined Approach (CA) and the Direct Approach (DA).
The rand index and ARI will be implemented to measure the extent to which a given dimensionality reduction method preserves the hierarchical clustering of a data set. Finally, the six reduction methods will be compared by runtime, inter-point distance preservation, variance preservation and hierarchical clustering preservation of the original data set.