Problem statement:
Finding whether a person buy insurance or not, based on his age.??
import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import data set
df= pd.read_csv("D:\\Raj_DataScience\\Documents\\insurance_data.csv")
print(df.head())
|
age |
bought_insurance |
0 |
22 |
0 |
1 |
25 |
0 |
2 |
47 |
1 |
3 |
52 |
0 |
4 |
46 |
1 |
print(df.shape)
(27, 2)
Shape gives that number of rows and columns in the data set. This is a small data set, we have list of 27 members only.
x=df["age"]
y=df.drop("age",axis=1)
# plotting of data set:
plt.scatter(x,y,marker='+', color='red')
plt.show()
From the plotting, we conclude that the young age i.e <30 yrs are less likely to buy insurance while >40 yrs are more likely to buy insurance. For this approach we can't use straight line because it won't go through the all points, hence we use "S" curve.Split the data into Train and Test:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(df[['age']],y,test_size=0.2)
Implementing Logistic Regression model :
from sklearn.linear_model import LogisticRegression
model=LogisticRegression()
model.fit(x_train,y_train)
y_pred=model.predict(x_test)
y_pred
array([1,0,1,1,1,1], dtype=int 64)
print(x_test)
|
age |
4 |
46 |
21 |
26 |
9 |
61 |
24 |
50 |
25 |
54 |
6 |
55 |
Accuracy:
model.score(x_test,y_test)
0.833333333333334
Comments
Post a Comment