14. R Squared#
R-Squared is a goodness of fit measure for regression models.
14.1. Definition#
Let \(f\) be a regression model trained on \(X = \{ X_1, X_2, \dots, X_n \}\) to predict \(y = \{ y_1, y_2, \dots, y_n \}\). The formula for r-squared is,
Where \(RSS\) is the sum of squared residuals and \(TSS\) is the total sum of squares.
The sum of squared residuals is the total squared difference between \(y_i\) and the predictions from the model \(\hat{y}_i\).
The total sum of squares the total squared difference between \(y_i\) and the mean of \(y\).
14.2. Interpretation#
TSS can be thought of as the total error if we used the mean of \(y\) as our model. Whereas RSS is the total error using our actual model. The ratio \(RSS / TSS\) tells how much better (or worse) the model \(f\) is compared to just using the mean \(\bar{y}\).
If \(f\) predicts \(y\) better than \(\bar{y}\) then the ratio \(RSS / TSS\) will be close to zero and \(R^2\) will be close to 1.
If \(f\) predicts \(y\) worse than \(\bar{y}\) then the ratio \(RSS / TSS\) will be greater than 1 and \(R^2\) will be negative.