Problem Statement:
To predict gas consumption (in millions of gallons) in 48 US states, based upon gas tax, per_capita income, Paved_Highways, and the proportion with a driver's license.
import pandas as pd
import numpy as np
Import data set:
df= pd.read_csv("D:\\Raj_DataScience\\Documents\\petrol_consumption.csv")
df.shape
(48, 5)
df.head()
Petrol_tax | Average_income | Paved_Highways | Population_Driver_licence(%) | Petrol_Consumption | |
---|---|---|---|---|---|
0 | 9.0 | 3571 | 1976 | 0.525 | 541 |
1 | 9.0 | 4092 | 1250 | 0.572 | 524 |
2 | 9.0 | 3865 | 1586 | 0.580 | 561 |
3 | 7.5 | 4870 | 2351 | 0.529 | 414 |
4 | 8.0 | 4399 | 431 | 0.544 | 410 |
df.describe()
Dividing data into attributes and labels:
x=df.drop('Petrol_Consumption',axis=1)
y=df['Petrol_Consumption']
Split the data Train and Test:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
Implementing Decision Tree:
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
Let's compare our Predicted values with Actual values:
df=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})
df
Actual | Predicted | |
---|---|---|
29 | 534 | 547.0 |
4 | 410 | 414.0 |
26 | 577 | 574.0 |
30 | 571 | 554.0 |
32 | 577 | 631.0 |
37 | 704 | 644.0 |
34 | 487 | 648.0 |
40 | 587 | 649.0 |
7 | 467 | 414.0 |
10 | 580 | 464.0 |
Model Evaluation:
from sklearn import metrics
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
Mean Absolute Error: 54.3
Mean Squared Error: 5302.9
Root Mean Squared Error: 72.82101345078905
Comments
Post a Comment