Skip to content

SLR: Properties of the Intercept Estimator

The Formulas

\[ E(\hat{\beta}_0) = \beta_0 \]
\[ \text{Var}(\hat{\beta}_0) = \sigma^2 \left(\frac{1}{n} + \frac{\bar{x}^2}{S_{xx}}\right) \]

What They Mean

Like its sibling \(\hat{\beta}_1\), the intercept estimator is unbiased — on average, it hits the true intercept exactly. No systematic error.

The variance formula is more interesting. It depends on three things: the noise level (\(\sigma^2\)), the sample size (\(n\)), and how far the mean of \(x\) is from zero (\(\bar{x}^2 / S_{xx}\)). That last term is the surprising one, and it tells a compelling geometric story.

Why It Works — The Intuition

The Geometric Picture

Remember that \(\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}\). The intercept is found by starting at the centroid \((\bar{x}, \bar{y})\) and extrapolating the line back to \(x = 0\).

Now imagine the regression line as a see-saw balanced on the centroid. The slope \(\hat{\beta}_1\) can wobble a little (it has its own variance). If \(\bar{x}\) is close to zero, that wobble barely moves the intercept — you're not extrapolating far. But if \(\bar{x}\) is far from zero, even a small wobble in the slope translates into a big swing at \(x = 0\).

That's why \(\bar{x}^2\) appears in the variance. The farther your data sits from the y-axis, the more the slope uncertainty amplifies into intercept uncertainty. It's the same reason a small steering error at high speed causes a bigger lane drift than at low speed — the error gets multiplied by the distance.

Full Derivation

Step 1: Express \(\hat{\beta}_0\) in terms of \(y_i\)'s

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} = \frac{1}{n}\sum y_i - \bar{x} \sum c_i y_i = \sum \left(\frac{1}{n} - \bar{x} \, c_i\right) y_i \]

Define \(d_i = \frac{1}{n} - \bar{x} \, c_i = \frac{1}{n} - \frac{\bar{x}(x_i - \bar{x})}{S_{xx}}\), so \(\hat{\beta}_0 = \sum d_i \, y_i\).

Step 2: Prove unbiasedness

Substitute \(y_i = \beta_0 + \beta_1 x_i + \varepsilon_i\):

\[ \hat{\beta}_0 = \sum d_i (\beta_0 + \beta_1 x_i + \varepsilon_i) \]

We need \(\sum d_i\) and \(\sum d_i x_i\):

\[ \sum d_i = \sum \frac{1}{n} - \bar{x}\sum c_i = 1 - 0 = 1 \]
\[ \sum d_i x_i = \frac{1}{n}\sum x_i - \bar{x}\sum c_i x_i = \bar{x} - \bar{x}(1) = 0 \]

Therefore:

\[ \hat{\beta}_0 = \beta_0(1) + \beta_1(0) + \sum d_i \varepsilon_i = \beta_0 + \sum d_i \varepsilon_i \]

Taking expectations:

\[ \boxed{E(\hat{\beta}_0) = \beta_0} \]

Unbiased, as promised. The \(\beta_1\) term drops out completely — the intercept estimator doesn't "see" the true slope at all.

Step 3: Derive the variance

\[ \text{Var}(\hat{\beta}_0) = \text{Var}\left(\sum d_i \varepsilon_i\right) = \sigma^2 \sum d_i^2 \]

Compute \(\sum d_i^2\):

\[ \sum d_i^2 = \sum \left(\frac{1}{n} - \frac{\bar{x}(x_i - \bar{x})}{S_{xx}}\right)^2 \]

Expand the square:

\[ = \sum \left(\frac{1}{n^2} - \frac{2\bar{x}(x_i - \bar{x})}{nS_{xx}} + \frac{\bar{x}^2(x_i-\bar{x})^2}{S_{xx}^2}\right) \]
\[ = \frac{1}{n} - \frac{2\bar{x}}{nS_{xx}}\underbrace{\sum(x_i-\bar{x})}_{=0} + \frac{\bar{x}^2}{S_{xx}^2}\underbrace{\sum(x_i-\bar{x})^2}_{=S_{xx}} \]
\[ = \frac{1}{n} + \frac{\bar{x}^2}{S_{xx}} \]

Therefore:

\[ \boxed{\text{Var}(\hat{\beta}_0) = \sigma^2\left(\frac{1}{n} + \frac{\bar{x}^2}{S_{xx}}\right)} \]

Reading the Formula

The variance has two additive components:

\(\frac{\sigma^2}{n}\) — This is the variance of \(\bar{y}\), the sample mean. Even if you knew the slope perfectly, you'd still have uncertainty in the intercept because the overall level of the data is uncertain. This term shrinks with more data.

\(\frac{\sigma^2 \bar{x}^2}{S_{xx}}\) — This is the amplified slope uncertainty. It equals \(\bar{x}^2 \cdot \text{Var}(\hat{\beta}_1)\). The slope wobble, multiplied by the lever arm \(\bar{x}^2\), produces extra uncertainty at \(x = 0\).

Special case: If \(\bar{x} = 0\) (data centered at the origin), the second term disappears and \(\text{Var}(\hat{\beta}_0) = \sigma^2/n\). The intercept becomes as precise as the sample mean. This is why centering your data is such a good idea in practice — it decouples the intercept from the slope.

Variables Explained

Symbol Name Description
\(\hat{\beta}_0\) OLS intercept estimator The intercept computed from a particular sample
\(\beta_0\) True intercept The population parameter we're estimating
\(\sigma^2\) Error variance Variance of the random errors
\(n\) Sample size Number of observations
\(\bar{x}\) Mean of \(x\) How far the data center is from the y-axis
\(S_{xx}\) Sum of squared \(x\)-deviations \(\sum(x_i - \bar{x})^2\)
\(d_i\) Weights \(\frac{1}{n} - \frac{\bar{x}(x_i - \bar{x})}{S_{xx}}\)

Worked Example

From the study hours data: \(n = 5\), \(\bar{x} = 5\), \(S_{xx} = 26\), \(\sigma^2 = 16\) (assumed).

\[ \text{Var}(\hat{\beta}_0) = 16\left(\frac{1}{5} + \frac{25}{26}\right) = 16(0.2 + 0.962) = 16 \times 1.162 = 18.58 \]
\[ \text{SE}(\hat{\beta}_0) = \sqrt{18.58} \approx 4.31 \]

Compare to \(\text{SE}(\hat{\beta}_1) \approx 0.78\). The intercept is much less precisely estimated — because \(\bar{x} = 5\) is far from zero, the slope uncertainty gets amplified by a lever arm of \(5^2 = 25\).

If the study hours had been centered around 0 (say, by measuring "hours above average"), the intercept variance would drop to \(16/5 = 3.2\) and \(\text{SE}(\hat{\beta}_0) \approx 1.79\) — less than half.

Common Mistakes

  • Ignoring that \(\text{Var}(\hat{\beta}_0)\) depends on \(\bar{x}\): Two datasets with the same \(n\), \(\sigma^2\), and \(S_{xx}\) can have wildly different intercept precision if their \(x\) values are centered differently.
  • Interpreting a significant intercept as meaningful: In many models, \(x = 0\) is outside the data range or physically meaningless. A precisely estimated intercept doesn't mean \(\beta_0\) is scientifically interesting.
  • Not centering when it helps: If you're interested in the intercept (e.g., the baseline response), centering the \(x\) values eliminates the slope-induced uncertainty. The slope estimate doesn't change — only the intercept and its standard error do.

References

  • Kutner, M. H., et al. (2004). Applied Linear Statistical Models, 5th ed. McGraw-Hill.
  • Weisberg, S. (2014). Applied Linear Regression, 4th ed. Wiley.
  • Draper, N. R. & Smith, H. (1998). Applied Regression Analysis, 3rd ed.