Bias - Variance Trade off

 In this we discuss some of the basic terms and Machine learning fundamentals that are relevant for model evaluation, namely bias, variance, under fitting, over fitting.


The over all goal in Machine learning is to obtain a model that generalizes well to new, unseen data. Some of the evaluation metrics we can use to measure the performance on the test set are the prediction accuracy and misclassification error in the context of classification models, we say that model has a "High generalization accuracy" or " Low generalization error".

In general, we might say that high variance is proportional to over fitting and high bias is proportional to under fitting. Let's see these terms more precisely.

Bias:

The bias is the difference between prediction values by model and actual values. Being high biasing gives a large error in training as well as testing data. It's recommended that an algorithm should always be low biased to avoid the problem of under fitting.

Variance:

The variability of model prediction for a given data point which tells us spread of our data is called the variance of the model. The model with high variance has very complex fit to the training data and this is not able to fit accurately on the data which it hasn't seen before. As a result, such models perform very well on training data but has high error rates on test data. When a model is high on variance, it is then said to as over fitting of data.

If the algorithm is too simple then it may be on high bias and low variance condition and thus is error prone. If the algorithm is fit too complex, then it may be high variance and low bias. In the later condition, the new entries will not perform well. There is something between both of these conditions
known as Trade-off  (or)  Bias- Variance Trade-off.

* Optimal model complexity is the best point to chosen for the training of the algorithm which gives low error in training as well a testing data.
 

Comments