Gradient Boosting

Gradient Boosting:

Gradient boosting is also based on sequential ensemble learning. The difference in this type of boosting it that the weights for misclassified outcomes are not incremented, instead, Gradient boosting method tries to optimize the loss function of the previous learner by adding a new model that adds weak learners in order to reduce the loss function.

Gradient boosting involves three elements:

Loss function: A loss function to be optimized
Weak learner: to make predictions
Additive model: to add weak learners to minimize the loss function.

Gradient boosting algorithm is a greedy algorithm and can over fit, a training data set. Improve the performance of algorithm by reducing over fitting.

Tree constraints
Shrinkage
Random sampling
Penalized learning

Like Ada boost, Gradient boosting can also be used for both classification and regression problems.

Pros: It iteratively correct the mistakes of the weak classifier and improve accuracy by combining weak learners.It gives better accuracy in most of the cases.
Cons: Hyper parameter tuning and time consuming, requires large space.

Gradient Boosting in practice:

Apply Gradient boosting algorithm on boston house price prediction data set:

import numpy as np

import pandas as pd

import data set:

from sklearn import datasets

df=datasets.load_boston()

x=pd.DataFrame(df.data, columns=df.feature_names)

y=pd.Series(df.target)

x.head()

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1.0	296.0	15.3	396.90	4.98
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2.0	242.0	17.8	396.90	9.14
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2.0	242.0	17.8	392.83	4.03
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3.0	222.0	18.7	394.63	2.94
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3.0	222.0	18.7	396.90	5.33

y.head()

0    24.0
1    21.6
2    34.7
3    33.4
4    36.2
dtype: float64

Split data set into Train and Test:

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.20)

Implementing Gradient boosting :

from sklearn.ensemble import GradientBoostingRegressor

gradient=GradientBoostingRegressor(max_depth=2,n_estimators=3,learning_rate=1.0)

model=gradient.fit(x_train,y_train)

y_pred=model.predict(x_test)

Model Evaluation:

from sklearn.metrics import r2_score

Accuracy=r2_score(y_test,y_pred)

Accuracy

0.75777778

Hyper parameter tuning:

from sklearn.model_selection import GridSearchCV

LR={'learning_rate':[0.15,0.10,0.05],'n_estimators':[100,150,200,250]}

tuning=GridSearchCV(estimator=GradientBoostingRegressor(),param_grid=LR, scoring='r2')

tuning.fit(x_train,y_train)

tuning.best_params_,tuning.best_score_

({'learning_rate': 0.15, 'n_estimators': 250}, 0.8750218539097616)

Data science with_Raj

Search This Blog

Gradient Boosting

Comments

Post a Comment