Decision Tree for Regression in practice

Problem Statement:

To predict gas consumption (in millions of gallons) in 48 US states, based upon gas tax, per_capita income, Paved_Highways, and the proportion with a driver's license.

import pandas as pd

import numpy as np

Import data set:

df= pd.read_csv("D:\\Raj_DataScience\\Documents\\petrol_consumption.csv")

df.shape

(48, 5)

df.head()

	Petrol_tax	Average_income	Paved_Highways	Population_Driver_licence(%)	Petrol_Consumption
0	9.0	3571	1976	0.525	541
1	9.0	4092	1250	0.572	524
2	9.0	3865	1586	0.580	561
3	7.5	4870	2351	0.529	414
4	8.0	4399	431	0.544	410

df.describe()

Dividing data into attributes and labels:

x=df.drop('Petrol_Consumption',axis=1)

y=df['Petrol_Consumption']

Split the data Train and Test:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Implementing Decision Tree:

from sklearn.tree import DecisionTreeRegressor

regressor = DecisionTreeRegressor()

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

Let's compare our Predicted values with Actual values:

df=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})

	Actual	Predicted
29	534	547.0
4	410	414.0
26	577	574.0
30	571	554.0
32	577	631.0
37	704	644.0
34	487	648.0
40	587	649.0
7	467	414.0
10	580	464.0

Model Evaluation:

from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))

print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Mean Absolute Error: 54.3
Mean Squared Error: 5302.9
Root Mean Squared Error: 72.82101345078905

Data science with_Raj

Search This Blog

Decision Tree for Regression in practice

Comments

Post a Comment