Random Forest Classification in practice

 Classifying iris data set by Random Forest.

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import data set:

from sklearn import datasets

iris=datasets.load_iris()

print(iris.target_names)

['setosa' 'versicolor' 'virginica']

print(iris.feature_names)

['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

print(iris.target)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]

0: Setosa    1: Versicolor    2: Virginica

df=pd.DataFrame({

    'sepal length':iris.data[:,0],

    'sepal width':iris.data[:,1],

    'petal length':iris.data[:,2],

    'petal width':iris.data[:,3],

    'species':iris.target

})

df.head()

sepal lengthsepal widthpetal lengthpetal widthspecies
05.13.51.40.20
14.93.01.40.20
24.73.21.30.20
34.63.11.50.20
45.03.61.40.20

df.shape

(150, 5)

x=df[['sepal length','sepal width','petal length','petal width']]

y=df['species']

 Train and Test set:

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3)

 Model Training:

from sklearn.ensemble import RandomForestClassifier

clf=RandomForestClassifier(n_estimators=100)

clf.fit(x_train,y_train)

y_pred=clf.predict(x_test)

Model Evaluation:

from sklearn import metrics

print("Accuracy:",metrics.accuracy_score(y_test,y_pred))

Accuracy: 0.9555555555555556


Comments