K Nearest Neighbors intro

K Nearest Neighbors or KNN algorithm is a simple supervised learning algorithm, which uses the entire data set in its training phase. This algorithm suggest that if you are similar to your neighbors, then you are one of them. For example, if apple looks more similar to peach,pear and cherry (fruits) than monkey, cat, and rat (animals), then most likely apple is a fruit. KNN  algorithm used to perform both regression and classification problems.

Regression: Whenever a prediction is required for unseen data instance, it searches through the entire training data set for K-most similar instances and the data with the similar instance is finally returned as prediction.

Classification: When tested with a new example, it looks through the training data and finds the k training examples that are closest to the new example. It then assigns the most common class label (among those k-training example) to the test example.


KNN is often used in search applications, where you are looking for similar items, like find items similar to this one. The main drawback of  KNN algorithm is sensitive to outliers.

Applications:
KNN used in the variety of applications such as finance, healthcare, Political science, hand writing detection, image recognition and video recognition and etc,..

What does "K" in KNN algorithm represents ??
K in KNN algorithm represents the number of nearest neighbor points which are voting for the new test data's class.
If  K = 5, the labels of the five closest classes are checked and the most common(i.e occurring at least thrice) label is assigned, and so on for larger K's.
* K = 'Odd number' choosing K value must be always an odd number, since if you choose even number there may be chances of  getting equal votes.

Manual implementation of KNN:

Let's suppose we have height and weight and its corresponding T-shirt size of several customers. Your task is to predict the T-shirt size of Anna whose height is 161 cm, and her weight is 61 kg.

Height (in cm’s)

Weight (in kg’s)

T- shirt size

158

58

M

158

59

M

158

63

M

160

59

M

160

60

M

163

60

M

163

61

M

160

64

L

163

64

L

165

61

L

165

62

L

165

65

L

168

62

L

168

63

L

168

66

L

170

63

L

170

64

L

170

68

L


Step: 1

Calculate the Euclidean distance between the new point and the existing points. For example, Euclidean distance between point P1 (1,1) and P2 (5,4) is 


Step: 2

Choose the value of K and select K neighbors closest to the new point in this case, select the top 5 parameters having least Euclidean distance K= 5.

Step: 3

Since, for K = 5, we have 4 T-shirts of size M, therefore according to the KNN algorithm, Anna of height 161 cm and weight 61 kg will fit into a T- shirt of size M.

Implementing of KNN algorithm using Python:
  • Handling the data
  • Calculate the distance
  • Find K nearest point
  • Predict the class
  • Check the accuracy

Comments