Abstract:
Data science involves using automated methods to analyze massive amounts of data and to extract knowledge from them. It is helping to create new branches of science, and influencing areas of social science and the humanities. Looking at the increase in size of data, data science is expected to accelerate in the recent future. Data science is an umbrella that contains many other fields like data mining, machine learning, big data, statistics, data visualization and data analytics etc. Clustering, a data mining technique is defined as partitioning large number of data points into smaller number of groups. It groups the objects in such a way that object with similar characteristics are in one group and objects in dissimilar characteristics are in different groups. This similarity can be measured in terms of distance measures function (e.g. edit distance). Clustering helps in summarizing the data and understanding it for a variety of data mining applications. It can be used in data summarization, social network analysis (community detection), customer segmentation and outlier detection. This presentation will focus on various types of distance measures, clustering and real life applications of clustering algorithms.