Posts

Showing posts from January, 2019

K-means clustering with Python

Image
Introduction K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain K number of clusters. The main idea is to define K centroids, one for each cluster. The next step is to take each point belonging to a given data set and associate it to the nearest centroid. At this point we need to re-calculate K new centroids of the clusters resulting from the previous step. After we have these K new centroids, a new binding has to be done between the same data set points and the nearest new centroid. As a result of this loop we may notice that the K centroids change their location step by step until no more changes are done. Implementation Scikit-learn provides with full implementation of K-means algorithm though KMeans class. Let's have a look at several interesting situations, which might occur during data clustering: import numpy