The goodness of fit determines how the line of regression fits the set of observations. The process of finding the best model out of various models is called optimization. For this purpose there are various metrics, like R-squared, F-test,RMSE etc..( these all based on the sum of squares total, sum of squares error).
SST measures how far the data are from the mean, and SSE measures how far the data from the model's predicted values.
Regression Metrics:
1. R - Squared ( Co-efficient of determination ):
- R - Squared is a statistical method that determines the goodness of fit.
- It measures the strength of the relationship between dependent variable and independent variable on a scale of 0 to 100 %.
- The high value of R^2 gives that how close the actual values and predicted values, hence model predicts accurately.
- R^2 ⇒ Explained variance in data,
- RSS ⇒ sum of squares of difference between predicted and actual values,
- TSS ⇒ sum of squares of difference between mean value and actual value.
It ranges from 0 to 1,
- 0 indicates that the model prediction was wrong
- 1 indicates that the model predicting perfectly.
For example:
- If R^2 = 0.8, it says that with the model we can explain 80% of what is going on in the real data, rest 20% can't be explained.
* When R^2 is not used? When the interest in the relationship between the variables, not in the prediction then the it is less important.
2. Adjusted R^2 :
Just like R^2, adjusted R^2 also shows how well terms fit a curve or line but adjusts for the number of terms in a model
Where, n is total number of observations, k is the number of predictions.
* Adjusted R^2 will always be less than or equal to R^2.
3. Root Mean Square Error ( RMSE ) :
Root mean square error is a measure of difference between actual values and predicted values. The lower, RMSE is better the model. It is always positive, and a value 0 indicates a perfect fit to the data.
RMSE= √MSE
4. Mean Absolute Percentage Error ( MAPE ):
To overcome the limitations of RMSE, analyst prefer MAPE , which gives error in terms of percentages and hence, comparable across models.
Comments
Post a Comment