SLR: Mean Response and Prediction¶
The Formulas¶
Estimated Mean Response¶
Variance of the Mean Response¶
Confidence Interval for the Mean Response \(E(Y | X = x_h)\)¶
Prediction Interval for a New Observation \(Y_{\text{new}}\) at \(x_h\)¶
What They Mean¶
There are two very different questions you can ask a regression model, and confusing them is one of the most common mistakes in statistics:
-
"What's the average \(y\) for this value of \(x\)?" — This asks about the mean response \(E(Y | X = x_h)\). You're estimating the center of the distribution at \(x_h\). The confidence interval answers this.
-
"What will the next observation be at this \(x\)?" — This asks about a single new data point. Even if you knew the mean perfectly, individual observations scatter around it. The prediction interval answers this.
The prediction interval is always wider — it includes both the uncertainty about the mean and the random scatter of individuals around that mean.
Why It Works — The Intuition¶
The Variance of \(\hat{Y}_h\) — Why the Bowtie Shape?¶
If you've ever seen a regression plot with confidence bands, you noticed they form a bowtie (or hourglass) shape — narrowest at \(\bar{x}\) and widening as you move away. The variance formula explains why.
At \(x_h = \bar{x}\): The second term vanishes. \(\text{Var}(\hat{Y}_h) = \sigma^2/n\). You're predicting the mean response at the centroid, where you have maximum information. The only uncertainty comes from estimating \(\bar{y}\).
As \(x_h\) moves away from \(\bar{x}\): The \((x_h - \bar{x})^2\) term grows. You're extrapolating along the regression line, and slope uncertainty gets amplified by the distance from the center. It's the see-saw effect again — the line pivots on the centroid, so points far from the center swing more.
This is exactly why extrapolation is dangerous. At \(x_h\) far from your data, the confidence band explodes. The model isn't necessarily wrong, but it's honest about how little it knows out there.
Full Derivation of \(\text{Var}(\hat{Y}_h)\)¶
This rewrite is beautiful. The predicted value is the sample mean \(\bar{y}\) plus a slope correction based on how far \(x_h\) is from \(\bar{x}\).
Now, \(\bar{y}\) and \(\hat{\beta}_1\) are both functions of the \(y_i\)'s. Are they independent? Yes — and this is a remarkable fact. \(\bar{y}\) captures the "level" of the data while \(\hat{\beta}_1\) captures the "tilt," and for normally distributed errors these are independent.
Since they're independent:
Two sources of uncertainty, added together: - \(\sigma^2/n\): uncertainty in the overall level (the mean) - \(\sigma^2(x_h - \bar{x})^2/S_{xx}\): uncertainty from the slope, amplified by distance from center
Confidence Interval vs. Prediction Interval — The Extra "1"¶
Now suppose you want to predict a new individual observation \(Y_{\text{new}} = \beta_0 + \beta_1 x_h + \varepsilon_{\text{new}}\) at the point \(x_h\).
The prediction error is:
The variance of this error has three components:
The cross term vanishes because \(\varepsilon_{\text{new}}\) is independent of the training data (it's a new observation). So:
That leading "1" is the irreducible noise — even with infinite data and a perfect regression line, individual observations still scatter around the mean with variance \(\sigma^2\). The confidence interval shrinks to zero as \(n \to \infty\). The prediction interval never gets narrower than \(\pm t \cdot \sigma\) — there's a floor set by the inherent randomness of the world.
This is a profound distinction: - Confidence interval: "Where is the true average?" → Gets arbitrarily precise with more data - Prediction interval: "Where will the next point land?" → Has an irreducible minimum width
The Bowtie Gets Wider¶
Both intervals widen as \(x_h\) moves from \(\bar{x}\), but the prediction interval starts wider (because of the +1) and grows at the same rate. Visually:
Prediction bands (wider)
/ \
/ Confidence bands \
/ / \ \
/ / \ \
---/---/---------- x̄ ----------------\--------\---
\ \ / /
\ \ / /
\ Confidence bands /
\ /
Prediction bands
Worked Example¶
Study hours data: \(n = 5\), \(\bar{x} = 5\), \(S_{xx} = 26\), \(\hat{\beta}_0 = 57.8\), \(\hat{\beta}_1 = 4.04\), \(s^2 = 4.53\) (estimated), so \(s = 2.13\).
Predict the mean score at \(x_h = 6\) hours¶
Confidence interval (95%, \(t_{3, 0.025} = 3.182\)):
We're 95% confident the average score for students who study 6 hours is between 78.7 and 85.4.
Prediction interval (same setup):
An individual student who studies 6 hours would probably score between 74.5 and 89.6 — a much wider range, because people vary.
Compare at \(x_h = 2\) (near the edge)¶
Confidence SE: \(s\sqrt{\frac{1}{5} + \frac{(2-5)^2}{26}} = 2.13\sqrt{0.2 + 0.346} = 2.13 \times 0.739 = 1.57\)
CI: \(65.88 \pm 3.182 \times 1.57 = 65.88 \pm 5.00 = (60.9, \; 70.9)\)
Wider than at \(x_h = 6\), because we're farther from \(\bar{x} = 5\). The bowtie opens up.
Common Mistakes¶
- Using a confidence interval when you need a prediction interval: If someone asks "what score will this student get?", you need a prediction interval. If they ask "what's the average score for students who study 6 hours?", you need a confidence interval. The first is about an individual; the second is about a population average.
- Extrapolating confidently: The bands widen for a reason. Predicting at \(x_h = 20\) hours with data from 2–8 hours is reckless. The formula will give you a number, but the model has no idea if the linear relationship holds that far out.
- Forgetting the prediction interval has a floor: No amount of data makes the prediction interval vanish. There's always \(\sigma^2\) of irreducible uncertainty. If \(\sigma\) is large, your model might be "correct" but still not very useful for individual predictions.
- Thinking narrow confidence bands mean good predictions: You can have razor-thin confidence bands (you know the mean precisely) and still have wide prediction bands (individuals scatter a lot). These are separate questions.
Related Formulas¶
- SLR: Deriving the OLS Estimators — the starting point
- SLR: Properties of the Slope Estimator — \(\text{Var}(\hat{\beta}_1)\) feeds into this derivation
- SLR: Properties of the Intercept Estimator — \(\text{Var}(\hat{\beta}_0)\)
- Standard Error — the general concept of estimator uncertainty
- Confidence Interval — the broader framework
History¶
- 1805–1809 — Legendre and Gauss develop least squares, but the focus is on point estimates — how to find the "best" line, not how uncertain it is
- 1821–1823 — Gauss derives the variance formulas for the estimators, giving us the tools to quantify uncertainty in regression
- 1908 — Gosset's t-distribution makes small-sample inference possible — crucial because early regression applications often had tiny datasets
- 1930s — Working and Hotelling independently derive the confidence bands for the entire regression line — the mathematical basis for the bowtie shape. Working showed that the simultaneous confidence band for the entire line requires the \(F\)-distribution, not just pointwise \(t\)-intervals
References¶
- Kutner, M. H., et al. (2004). Applied Linear Statistical Models, 5th ed.
- Weisberg, S. (2014). Applied Linear Regression, 4th ed.
- Working, H. & Hotelling, H. (1929). "Applications of the Theory of Error to the Interpretation of Trends." J. Amer. Statist. Assoc.