Random Forest Regression in practice

Petrol consumption prediction by Random Forest.

 import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

data=pd.read_csv("D:\\Raj_DataScience\\Documents\\petrol_consumption.csv")

data.shape

data.head()

Petrol_taxAverage_incomePaved_HighwaysPopulation_Driver_licence(%)Petrol_Consumption
09.0357119760.525541
19.0409212500.572524
29.0386515860.580561
37.5487023510.529414
48.043994310.544410

Attribute and labels:

x=data.iloc[:,0:4].values

y=data.iloc[:,4].values

Train and Test set:

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=0)

Feature scaling:

from sklearn.preprocessing import StandardScaler

sc=StandardScaler()

x_train=sc.fit_transform(x_train)

x_test=sc.transform(x_test)

 Building Random forest:

from sklearn.ensemble import RandomForestRegressor

regressor=RandomForestRegressor(n_estimators=50,random_state=0)

regressor.fit(x_train,y_train)

y_pred=regressor.predict(x_test)

 Model Evaluation:

from sklearn import metrics

print("Mean Absolute Error:",metrics.mean_absolute_error(y_test,y_pred))

print("Mean Squared Error:",metrics.mean_squared_error(y_test,y_pred))

print("Root Mean Squared Error:",np.sqrt(metrics.mean_absolute_error(y_test,y_pred)))

Mean Absolute Error: 49.222000000000016
Mean Squared Error: 3736.462600000001
Root Mean Squared Error: 7.015839222787251

data=pd.DataFrame({"Actual":y_test,"Predicted":y_pred})

data

ActualPredicted
0534571.58
1410502.40
2577604.98
3571575.46
4577615.48
5704601.86
6487586.60
7587567.64
8467463.02
9580513.76

Comments