Ada Boost

Ada Boost (Adaptive Boosting):

Ada boost helps you combine multiple weak classifier into a single strong classifier. Ada boost works by putting more weight on difficult to classify instances and less on those already handled well. This algorithm can be used for both classification and regression problem.

How does the Ada boost algorithm work??

Initially, Ada boost selects a training subset randomly
It iteratively train the Ada boost machine learning model by selecting the training set based on the accurate prediction of the last training.
It assigns the higher weight to wrong classified observations so that in the next iteration these observations will get the high probability for classification.
Also, it assigns the weight to the trained classifier in each iteration according to the accuracy of the classifier. The more accurate classifier will get high weight.
This process iterate until the complete training data fits without any error or until reached to the specified maximum number of estimators.
To classify, perform a "vote" across all of the learning algorithms you built.

Pros and cons:

Ada boost is easy to implement. It iteratively correct the mistakes of the weak classifier and improves accuracy by combining weak learners. Ada boost not prone to over fitting.
Ada boost is sensitive to noise data. It is highly effected by outliers because it tries to fit each point perfectly. Ada boost is slower compared to XG boost.

Ada Boosting practical implementation:

import numpy as np

import pandas as pd

Load Data set:

from sklearn import datasets

iris=datasets.load_iris()

x=iris.data

y=iris.target

Split data into Train and Test :

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

Implementing Ada Boost model:

from sklearn.ensemble import AdaBoostClassifier

abc=AdaBoostClassifier(n_estimators=50,learning_rate=1)

model=abc.fit(x_train,y_train)

y_pred=model.predict(x_test)

The most important parameters are base_estimator, n_estimaters and learning rate.

base_estimator: It is a weak learners used to train the model. It uses DecisionTreeClassifier as default weak learner for training purpose you can specify different machine learning algorithms.

n_estimators: Number of weak learners to train iteratively

learning_rate: It contributes to the weights of weak learners. It uses 1 as default value.

Evaluation:

from sklearn import metrics

print("Accuracy:",metrics.accuracy_score(y_test,y_pred))

Accuracy: 0.9666666666667

Using different base_learners (Ada Boost):

Let's use svc as a base estimator

from sklearn.svm import SVC

svc=SVC(probability=True,kernel='linear')

abc=AdaBoostClassifier(n_estimators=50,base_estimator=svc,learning_rate=1)

model=abc.fit(x_train,y_train)

y_pred=model.predict(x_test)

Accuracy:

print("Accuracy:",metrics.accuracy_score(y_test,y_pred))

1.0

Data science with_Raj

Search This Blog

Ada Boost

Comments

Post a Comment