SLR: Properties of the Intercept Estimator¶
The Formulas¶
What They Mean¶
Like its sibling \(\hat{\beta}_1\), the intercept estimator is unbiased — on average, it hits the true intercept exactly. No systematic error.
The variance formula is more interesting. It depends on three things: the noise level (\(\sigma^2\)), the sample size (\(n\)), and how far the mean of \(x\) is from zero (\(\bar{x}^2 / S_{xx}\)). That last term is the surprising one, and it tells a compelling geometric story.
Why It Works — The Intuition¶
The Geometric Picture¶
Remember that \(\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}\). The intercept is found by starting at the centroid \((\bar{x}, \bar{y})\) and extrapolating the line back to \(x = 0\).
Now imagine the regression line as a see-saw balanced on the centroid. The slope \(\hat{\beta}_1\) can wobble a little (it has its own variance). If \(\bar{x}\) is close to zero, that wobble barely moves the intercept — you're not extrapolating far. But if \(\bar{x}\) is far from zero, even a small wobble in the slope translates into a big swing at \(x = 0\).
That's why \(\bar{x}^2\) appears in the variance. The farther your data sits from the y-axis, the more the slope uncertainty amplifies into intercept uncertainty. It's the same reason a small steering error at high speed causes a bigger lane drift than at low speed — the error gets multiplied by the distance.
Full Derivation¶
Step 1: Express \(\hat{\beta}_0\) in terms of \(y_i\)'s
Define \(d_i = \frac{1}{n} - \bar{x} \, c_i = \frac{1}{n} - \frac{\bar{x}(x_i - \bar{x})}{S_{xx}}\), so \(\hat{\beta}_0 = \sum d_i \, y_i\).
Step 2: Prove unbiasedness
Substitute \(y_i = \beta_0 + \beta_1 x_i + \varepsilon_i\):
We need \(\sum d_i\) and \(\sum d_i x_i\):
Therefore:
Taking expectations:
Unbiased, as promised. The \(\beta_1\) term drops out completely — the intercept estimator doesn't "see" the true slope at all.
Step 3: Derive the variance
Compute \(\sum d_i^2\):
Expand the square:
Therefore:
Reading the Formula¶
The variance has two additive components:
\(\frac{\sigma^2}{n}\) — This is the variance of \(\bar{y}\), the sample mean. Even if you knew the slope perfectly, you'd still have uncertainty in the intercept because the overall level of the data is uncertain. This term shrinks with more data.
\(\frac{\sigma^2 \bar{x}^2}{S_{xx}}\) — This is the amplified slope uncertainty. It equals \(\bar{x}^2 \cdot \text{Var}(\hat{\beta}_1)\). The slope wobble, multiplied by the lever arm \(\bar{x}^2\), produces extra uncertainty at \(x = 0\).
Special case: If \(\bar{x} = 0\) (data centered at the origin), the second term disappears and \(\text{Var}(\hat{\beta}_0) = \sigma^2/n\). The intercept becomes as precise as the sample mean. This is why centering your data is such a good idea in practice — it decouples the intercept from the slope.
Variables Explained¶
| Symbol | Name | Description |
|---|---|---|
| \(\hat{\beta}_0\) | OLS intercept estimator | The intercept computed from a particular sample |
| \(\beta_0\) | True intercept | The population parameter we're estimating |
| \(\sigma^2\) | Error variance | Variance of the random errors |
| \(n\) | Sample size | Number of observations |
| \(\bar{x}\) | Mean of \(x\) | How far the data center is from the y-axis |
| \(S_{xx}\) | Sum of squared \(x\)-deviations | \(\sum(x_i - \bar{x})^2\) |
| \(d_i\) | Weights | \(\frac{1}{n} - \frac{\bar{x}(x_i - \bar{x})}{S_{xx}}\) |
Worked Example¶
From the study hours data: \(n = 5\), \(\bar{x} = 5\), \(S_{xx} = 26\), \(\sigma^2 = 16\) (assumed).
Compare to \(\text{SE}(\hat{\beta}_1) \approx 0.78\). The intercept is much less precisely estimated — because \(\bar{x} = 5\) is far from zero, the slope uncertainty gets amplified by a lever arm of \(5^2 = 25\).
If the study hours had been centered around 0 (say, by measuring "hours above average"), the intercept variance would drop to \(16/5 = 3.2\) and \(\text{SE}(\hat{\beta}_0) \approx 1.79\) — less than half.
Common Mistakes¶
- Ignoring that \(\text{Var}(\hat{\beta}_0)\) depends on \(\bar{x}\): Two datasets with the same \(n\), \(\sigma^2\), and \(S_{xx}\) can have wildly different intercept precision if their \(x\) values are centered differently.
- Interpreting a significant intercept as meaningful: In many models, \(x = 0\) is outside the data range or physically meaningless. A precisely estimated intercept doesn't mean \(\beta_0\) is scientifically interesting.
- Not centering when it helps: If you're interested in the intercept (e.g., the baseline response), centering the \(x\) values eliminates the slope-induced uncertainty. The slope estimate doesn't change — only the intercept and its standard error do.
Related Formulas¶
- SLR: Deriving the OLS Estimators — where \(\hat{\beta}_0\) comes from
- SLR: Properties of the Slope Estimator — the companion derivation for \(\hat{\beta}_1\)
- SLR: Mean Response and Prediction — using both estimators for inference
- Standard Error — the general concept
References¶
- Kutner, M. H., et al. (2004). Applied Linear Statistical Models, 5th ed. McGraw-Hill.
- Weisberg, S. (2014). Applied Linear Regression, 4th ed. Wiley.
- Draper, N. R. & Smith, H. (1998). Applied Regression Analysis, 3rd ed.