Concept behind SVM :
Now consider what if we had data as shown in image below?
When transform back this line to original plane, it maps to circular boundary. These transformations are called Kernels.
Making it a little more complex..
What if data plot overlaps? or what in case some of the black points are inside the blue one?
The first one tolerates some outlier points. The second one is trying to achieve 0 tolerance with perfect partition.
But there is a trade off, In real world applications finding perfect class for millions of training data set takes lot of time. As you see in coding. This is called regularization parameter.
Tuning parameters :
- Kernel
- Regularization
- Gamma
- Margin
Kernel:
The learning of the hyper plane in a linear SVM is done by transforming the problem using some linear algebra. This is where the kernel plays role. For linear kernel the equation for prediction for a new input using the dot product between the input (X) and each each support vector (Xi) is calculated as
f(X) = B(0) + sum ( ai* (X, Xi))
where, Bo and ai are coefficients, estimated from the training data by the learning algorithm.
The polynomial kernel can be written as K(X, Xi) = 1+sum(X, Xi) ^ d and exponential as K(X, Xi) = exp(-gamma * sum(X- Xi²)). Polynomial and exponential kernels calculates separation line in higher dimension. This is called kernel trick.
Regularization :
The regularization parameter ( C ) tells the SVM optimization how much you want to avoid misclassify each training example.
For large values of C, the optimization will choose a smaller margin hyper plane if that hyper plane does a better job of getting all the training points classified correctly. Conversely, a very small value of C will cause the optimizer to look for a larger-margin separating hyper plane, even if that misclassifies more points.
Gamma :
Comments
Post a Comment