Problem Statement:
Classifying iris-data set by using SVM ??
Import data set:
from sklearn import datasets
iris=datasets.load_iris()
# print the label species (setosa,versicolor,virginica)
print(iris.target_names)
['Setosa', 'Versicolor', 'Virginica']
print the names of features:
print(iris.feature_names)
Creating a Dataframe of given iris dataset:
data=pd.DataFrame({
'sepal length':iris.data[:,0],
'sepel width':iris.data[:,1],
'petal length':iris.data[:,2],
'petal width':iris.data[:,3],
'species':iris.target
})
data.head(5)
x=data.drop('species',axis=1)y=data['species']
Train & Test :
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.20)
To train the kernel SVM, we use the same SVC class. The difference lies in the value for the kernel parameter of the SVC class. In the case of the simple SVM we used 'linear' as the value for the kernel parameter. However, for kernel SVM you can use Gaussian, Polynomial, Sigmoid kernel. We will implement polynomial, Gaussian, and Sigmoid kernels to see which one works better for our problem.
polynomial kernel:
In the case of Polynomial kernel, you also have to pass a value for the degree parameter of the SVC class. This is basically the degree of the polynomial.
from sklearn.svm import SVC
svcclassifier=SVC(kernel='poly',degree=8)
svcclassifier.fit(x_train,y_train)
y_pred=svcclassifier.predict(x_test)
Model Evaluation:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
[[ 9 0 0]
[ 0 7 0]
[ 0 0 14]]
precision recall f1-score support
0 1.00 1.00 1.00 9
1 1.00 1.00 1.00 7
2 1.00 1.00 1.00 14
avg / total 1.00 1.00 1.00 30
1.0
Gaussian kernel:
To use Gaussian kernel, you have to specify 'rbf ' as a value for the kernel parameters of the SVC class.
from sklearn.svm import SVC
svcclassifier=SVC(kernel='rbf')
svcclassifier.fit(x_train,y_train)
y_pred=svcclassifier.predict(x_test)
Model Evaluation:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
[[ 9 0 0]
[ 0 7 0]
[ 0 1 13]]
precision recall f1-score support
0 1.00 1.00 1.00 9
1 0.88 1.00 0.93 7
2 1.00 0.93 0.96 14
avg / total 0.97 0.97 0.97 30
0.9666666666666667
Sigmoid kernel:
To use the Sigmoid kernel, you have to specify 'sigmoid'
from sklearn.svm import SVC
svcclassifier=SVC(kernel='sigmoid')
svcclassifier.fit(x_train,y_train)
y_pred=svcclassifier.predict(x_test)
Model Evaluation:
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
[[ 0 9 0]
[ 0 7 0]
[ 0 14 0]]
precision recall f1-score support
0 0.00 0.00 0.00 9
1 0.23 1.00 0.38 7
2 0.00 0.00 0.00 14
avg / total 0.05 0.23 0.09 30
0.23333333333333334
Comparison of Kernel performance :
If we compare the performance of the different types of kernels we can clearly see that the Sigmoid kernel performs the worst, since sigmoid is more suitable for binary classification problems. Amongst the Gaussian and Polynomial kernel, we can see that both perform well. Here we got 100% accuracy with Polynomial kernel, in maximum number of cases 100% accuracy leads to over fitting. However there is no hard and fast rule to which kernel performs best in every scenario.
Comments
Post a Comment