Logistic regression in practice

 Problem statement:

Finding whether a person buy insurance or not, based on his age.??

 import libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

 import data set

df= pd.read_csv("D:\\Raj_DataScience\\Documents\\insurance_data.csv")

print(df.head())

 

age

bought_insurance

0

22

0

1

25

0

2

47

1

3

52

0

4

46

1

In, this data we have only two factors 'age' and 'bought_insurance'. Here, 'age' is independent variable and 'bought_insurance' is dependent variable. 

print(df.shape)

(27, 2)

Shape gives that number of rows and columns in the data set. This is a small data set, we have list of 27 members only.

x=df["age"]

y=df.drop("age",axis=1)

# plotting of data set:

plt.scatter(x,y,marker='+', color='red')

plt.show()

From the plotting, we conclude that the young age i.e <30 yrs are less likely to buy insurance while  >40 yrs are more likely to buy insurance. For this approach we can't use straight line because it won't  go through the all points, hence we use "S" curve.

Split the data into Train and Test:

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(df[['age']],y,test_size=0.2) 

 Implementing Logistic Regression model :

from sklearn.linear_model import LogisticRegression

model=LogisticRegression()

model.fit(x_train,y_train)

y_pred=model.predict(x_test)

y_pred

array([1,0,1,1,1,1], dtype=int 64)

print(x_test)

 

age

4

46

21

26

9

61

24

50

25

54

6

55


In the out put 1 indicates that a person will buy insurance, and  0 indicates that a person won't buy insurance.

Accuracy:
model.score(x_test,y_test)

0.833333333333334


Comments