K-nearest Neighbors (KNN) in Python
Introduction
Neighbors-based classification is a type of instance-based learning or non-generalizing learning: it does not attempt to construct a general internal model, but simply stores instances of the training data. Classification is computed from a simple majority vote of the nearest neighbors of each point: a query point is assigned the data class which has the most representatives within the nearest neighbors of the point.
Implementation
scikit-learn implements two different nearest neighbors classifiers: KNeighborsClassifier implements learning based on the k nearest neighbors of each query point, where k is an integer value specified by the user. RadiusNeighborsClassifier implements learning based on the number of neighbors within a fixed radius r of each training point, where r is a floating-point value specified by the user.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap from sklearn import neighbors, datasets n_neighbors = 15 # import some data to play with iris = datasets.load_iris() X = iris.data[:, : 2 ] # we only take the first two features. y = iris.target h = . 02 # step size in the mesh # Create color maps cmap_light = ListedColormap([ '#FFAAAA' , '#AAFFAA' , '#AAAAFF' ]) cmap_bold = ListedColormap([ '#FF0000' , '#00FF00' , '#0000FF' ]) # we create an instance of Neighbours Classifier and fit the data. clf = neighbors.KNeighborsClassifier(n_neighbors) clf.fit(X, y) # Plot the decision boundary. For that, we will assign a color to each # point in the mesh [x_min, m_max]x[y_min, y_max]. x_min, x_max = X[:, 0 ]. min () - 1 , X[:, 0 ]. max () + 1 y_min, y_max = X[:, 1 ]. min () - 1 , X[:, 1 ]. max () + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) # Put the result into a color plot Z = Z.reshape(xx.shape) plt.figure() plt.pcolormesh(xx, yy, Z, cmap = cmap_light) # Plot also the training points plt.scatter(X[:, 0 ], X[:, 1 ], c = y, cmap = cmap_bold) plt.xlim(xx. min (), xx. max ()) plt.ylim(yy. min (), yy. max ()) plt.show() |
Now let's see how RadiusNeighborsClassifier works. After playing a bit with hyperparameters, we'll achieve the following plot:
1 2 3 | ... clf = neighbors.RadiusNeighborsClassifier( 3.0 , weights = 'distance' ) ... |
Conclusion
Despite its simplicity, nearest neighbors has been successful in a large number of classification and regression problems, including handwritten digits or satellite image scenes. Being a non-parametric method, it is often successful in classification situations where the decision boundary is very irregular.
Comments
Post a Comment