Problem statement:
Predicting whether a patient having heart disease or not based on some independent variables like sex, age, blood vessels, different diagnosis and etc.,
import pandas as pd
import numpy as np
Importing data set:
df=pd.read_csv("D:\\Raj_DataScience\\Documents\\heart.csv")
df.head()
age | sex | cp | trestbps | chol | fbs | restecg | thalach | exang | oldpeak | slope | ca | thal | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 63 | 1 | 3 | 145 | 233 | 1 | 0 | 150 | 0 | 2.3 | 0 | 0 | 1 | 1 |
1 | 37 | 1 | 2 | 130 | 250 | 0 | 1 | 187 | 0 | 3.5 | 0 | 0 | 2 | 1 |
2 | 41 | 0 | 1 | 130 | 204 | 0 | 0 | 172 | 0 | 1.4 | 2 | 0 | 2 | 1 |
3 | 56 | 1 | 1 | 120 | 236 | 0 | 1 | 178 | 0 | 0.8 | 2 | 0 | 2 | 1 |
4 | 57 | 0 | 0 | 120 | 354 | 0 | 1 | 163 | 1 | 0.6 | 2 | 0 | 2 | 1 |
df.shape
(303, 14)
Attributes and Labels:
x=df.drop('target',axis=1)
y=df['target']
df['target'].unique()
array ([0, 1])
Split the data into Train and Test:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)
Implementing Decision Tree classifier:
from sklearn.tree import DecisionTreeClassifier
classifier=DecisionTreeClassifier()
classifier.fit(x_train,y_train)
y_pred=classifier.predict(x_test)
Model Evaluation:
from sklearn.metrics import confusion_matrix,classification_report,accuracy_score
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
[[19 7]
[ 5 30]]
precision recall f1-score support
0 0.79 0.73 0.76 26
1 0.81 0.86 0.83 35
avg / total 0.80 0.80 0.80 61
0.8032786885245902
Comments
Post a Comment