Decision Tree for Regression in practice

 Problem Statement:

To predict gas consumption (in millions of gallons) in 48 US states, based upon gas tax, per_capita income, Paved_Highways, and the proportion with a driver's license.


import pandas as pd

import numpy as np

Import data set:

df= pd.read_csv("D:\\Raj_DataScience\\Documents\\petrol_consumption.csv")

df.shape

(48, 5)

df.head()

Petrol_taxAverage_incomePaved_HighwaysPopulation_Driver_licence(%)Petrol_Consumption
09.0357119760.525541
19.0409212500.572524
29.0386515860.580561
37.5487023510.529414
48.043994310.544410

df.describe()


Dividing data into attributes and labels:

x=df.drop('Petrol_Consumption',axis=1)

y=df['Petrol_Consumption']

Split the data Train and Test:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Implementing Decision Tree:

from sklearn.tree import DecisionTreeRegressor

regressor = DecisionTreeRegressor()

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)

Let's compare our Predicted values with Actual values:

df=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})

df

ActualPredicted
29534547.0
4410414.0
26577574.0
30571554.0
32577631.0
37704644.0
34487648.0
40587649.0
7467414.0
10580464.0

Model Evaluation:

from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))

print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Mean Absolute Error: 54.3
Mean Squared Error: 5302.9
Root Mean Squared Error: 72.82101345078905

Comments