K Nearest Neighbors intro

K Nearest Neighbors or KNN algorithm is a simple supervised learning algorithm, which uses the entire data set in its training phase. This algorithm suggest that if you are similar to your neighbors, then you are one of them. For example, if apple looks more similar to peach,pear and cherry (fruits) than monkey, cat, and rat (animals), then most likely apple is a fruit. KNN algorithm used to perform both regression and classification problems.

Regression: Whenever a prediction is required for unseen data instance, it searches through the entire training data set for K-most similar instances and the data with the similar instance is finally returned as prediction.

Classification: When tested with a new example, it looks through the training data and finds the k training examples that are closest to the new example. It then assigns the most common class label (among those k-training example) to the test example.

KNN is often used in search applications, where you are looking for similar items, like find items similar to this one. The main drawback of KNN algorithm is sensitive to outliers.

Applications:

KNN used in the variety of applications such as finance, healthcare, Political science, hand writing detection, image recognition and video recognition and etc,..

What does "K" in KNN algorithm represents ??

K in KNN algorithm represents the number of nearest neighbor points which are voting for the new test data's class.

If K = 5, the labels of the five closest classes are checked and the most common(i.e occurring at least thrice) label is assigned, and so on for larger K's.

* K = 'Odd number' choosing K value must be always an odd number, since if you choose even number there may be chances of getting equal votes.

Manual implementation of KNN:

Let's suppose we have height and weight and its corresponding T-shirt size of several customers. Your task is to predict the T-shirt size of Anna whose height is 161 cm, and her weight is 61 kg.

Height (in cm’s)	Weight (in kg’s)	T- shirt size
158	58	M
158	59	M
158	63	M
160	59	M
160	60	M
163	60	M
163	61	M
160	64	L
163	64	L
165	61	L
165	62	L
165	65	L
168	62	L
168	63	L
168	66	L
170	63	L
170	64	L
170	68	L

Step: 1

Calculate the Euclidean distance between the new point and the existing points. For example, Euclidean distance between point P1 (1,1) and P2 (5,4) is

Step: 2

Choose the value of K and select K neighbors closest to the new point in this case, select the top 5 parameters having least Euclidean distance K= 5.

Step: 3

Since, for K = 5, we have 4 T-shirts of size M, therefore according to the KNN algorithm, Anna of height 161 cm and weight 61 kg will fit into a T- shirt of size M.

Implementing of KNN algorithm using Python:

Handling the data
Calculate the distance
Find K nearest point
Predict the class
Check the accuracy

Data science with_Raj

Search This Blog

K Nearest Neighbors intro

Comments

Post a Comment