R-Squared¶

The Formula¶

\[ R^2 = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}} \]

Where: * \(SS_{\text{res}} = \sum (y_i - \hat{y}_i)^2\) (Sum of Squared Residuals) * \(SS_{\text{tot}} = \sum (y_i - \bar{y})^2\) (Total Sum of Squares)

In Simple Linear Regression, it is also simply the square of the correlation coefficient:

\[ R^2 = r^2 \]

What It Means¶

\(R^2\) (coefficient of determination) represents the proportion of the variance for a dependent variable that's explained by an independent variable (or variables) in a regression model.

It ranges from 0 to 1 (usually): * \(R^2 = 0\): The model explains none of the variability of the response data around its mean. * \(R^2 = 1\): The model explains all the variability of the response data around its mean. The predictions are perfect.

Think of it as a grade for your model: "My model explains 85% of why the data looks the way it does."

Why It Works — The Intuition¶

Imagine you want to guess a value \(y\) without knowing anything else. Your best guess is the mean (\(\bar{y}\)). The total error of this naive "mean model" is the Total Sum of Squares (\(SS_{\text{tot}}\)).

Now, you build a regression model. It makes predictions \(\hat{y}\). The error of this model is the Sum of Squared Residuals (\(SS_{\text{res}}\)).

If your model is perfect, \(SS_{\text{res}} = 0\), so \(R^2 = 1 - 0 = 1\).
If your model is no better than just guessing the mean, \(SS_{\text{res}} \approx SS_{\text{tot}}\), so \(R^2 = 1 - 1 = 0\).

\(R^2\) literally measures how much less error you have compared to the baseline mean model.

Derivation¶

The derivation relies on the decomposition of variance. The total variation in \(y\) can be split into two parts: 1. Variation explained by the model (\(SS_{\text{reg}}\)) 2. Variation unexplained by the model (\(SS_{\text{res}}\))

\[ \sum (y_i - \bar{y})^2 = \sum (\hat{y}_i - \bar{y})^2 + \sum (y_i - \hat{y}_i)^2 \]

\[ SS_{\text{tot}} = SS_{\text{reg}} + SS_{\text{res}} \]

(Note: This equality holds exactly only for OLS regression with an intercept.)

We define \(R^2\) as the fraction of explained variation:

\[ R^2 = \frac{SS_{\text{reg}}}{SS_{\text{tot}}} \]

Using the identity above (\(SS_{\text{reg}} = SS_{\text{tot}} - SS_{\text{res}}\)):

\[ R^2 = \frac{SS_{\text{tot}} - SS_{\text{res}}}{SS_{\text{tot}}} = 1 - \frac{SS_{\text{res}}}{SS_{\text{tot}}} \]

Variables Explained¶

Symbol	Name	Description
\(R^2\)	Coefficient of Determination	Proportion of variance explained
\(SS_{\text{tot}}\)	Total Sum of Squares	Variance of data from the mean (The "Baseline")
\(SS_{\text{res}}\)	Residual Sum of Squares	Variance of data from the model (The "Leftover")
\(y_i\)	Actual Value	Observed data point
\(\hat{y}_i\)	Predicted Value	Value predicted by the regression line
\(\bar{y}\)	Mean	Average of observed \(y\) values

Worked Example¶

Data: \(y = [10, 20, 30]\). Mean \(\bar{y} = 20\). Model Predictions: \(\hat{y} = [12, 18, 30]\).

Calculate \(SS_{\text{tot}}\) (Baseline Error):
- \((10-20)^2 + (20-20)^2 + (30-20)^2\)
- \(100 + 0 + 100 = 200\)
Calculate \(SS_{\text{res}}\) (Model Error):
- \((10-12)^2 + (20-18)^2 + (30-30)^2\)
- \((-2)^2 + (2)^2 + 0^2\)
- \(4 + 4 + 0 = 8\)
Calculate \(R^2\):
- \(R^2 = 1 - \frac{8}{200} = 1 - 0.04 = 0.96\)

The model explains 96% of the variance.

Common Mistakes¶

"Higher is Always Better": Not true. You can increase \(R^2\) just by adding junk variables to the model (overfitting). Use Adjusted \(R^2\) to penalize complexity.
\(R^2\) vs Causality: A high \(R^2\) does not imply causation.
Low \(R^2\) means bad model: Not always. In psychology or social sciences, an \(R^2\) of 0.3 might be significant because human behavior is noisy. In physics, 0.3 would be terrible.

Adjusted R-Squared — Corrects for the number of predictors.
Pearson's Correlation — \(r = \pm\sqrt{R^2}\) (in SLR).
Simple Linear Regression — The model usually evaluated by \(R^2\).