Skip to content

Student's t-Test

The Question

You have two groups of data (say, the heights of plants treated with fertilizer A vs. fertilizer B). The average of group A is higher. But is it really higher, or is that just random luck?

If you had thousands of plants, you could just trust the average (thanks to the Central Limit Theorem). But you don't. You have 10 plants.

How do you make a decision when your sample size is small and you don't know the true variability of the world?

The Formula

For a one-sample t-test (comparing a sample mean \(\bar{x}\) to a target value \(\mu_0\)):

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]

For an independent two-sample t-test (comparing two groups):

\[$ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} $\]

Once you calculate \(t\), you look it up in a t-table (or ask a computer) to find the p-value.

The Story: Beer and Barley

This test exists because of beer. Specifically, Guinness stout.

In the early 1900s, William Sealy Gosset was a chemist at Guinness in Dublin. His job was to test batches of barley to ensure the quality of the beer. He couldn't test every single grain (that would take forever and destroy the supply), so he had to rely on small samples.

The standard statistics of the time (using the normal distribution / Z-test) assumed you had enough data to know the true standard deviation (\(\sigma\)) perfectly. Gosset didn't. He used the sample standard deviation (\(s\)), which is itself an estimate and thus prone to error.

He noticed that using the standard Z-test on small samples led to too many "false positives" — he was rejecting good batches or accepting bad ones more often than the math predicted. The normal curve was too skinny; it didn't account for the extra uncertainty of not knowing \(\sigma\).

Since Guinness prohibited employees from publishing trade secrets, Gosset published his work under the pseudonym "Student". Hence, "Student's t-test".

The Intuition

Signal vs. Noise

The t-statistic is just a Signal-to-Noise Ratio.

\[$ t = \frac{\text{Signal}}{\text{Noise}} $\]
  • Signal: The difference between your sample mean and the null hypothesis (\(\bar{x} - \mu_0\)). How far "off" is your result?
  • Noise: The Standard Error (\(s / \sqrt{n}\)). How much would you expect the mean to bounce around just by chance?

If \(t = 3\), it means your result is 3 times larger than the expected random noise. That's strong evidence. If \(t = 0.5\), your signal is buried in the noise.

Why Not Just Use Z?

The Z-statistic looks identical:

\[$ Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} $\]

The difference is in the denominator. * Z-test: Uses \(\sigma\) (the Truth). It assumes you know the population's spread exactly. * t-test: Uses \(s\) (an Estimate).

When \(n\) is small, \(s\) might be very different from \(\sigma\). It might be way too small just by luck. If \(s\) is too small, the denominator is too small, making \(t\) huge. This would make you think you found a significant result when you really didn't.

Gosset's t-distribution corrects for this. It is "fatter" in the tails than the normal distribution. This means extreme values are more common by chance, so you need a larger t-value to be impressed. It sets the bar higher to account for your ignorance about \(\sigma\).

Derivazione: The Logic

We don't need to derive the full PDF here (see the t-Distribution page for that), but we can derive the structure of the test statistic.

  1. Standardize the Mean: We know that the sample mean \(\bar{x}\) is approximately normal (CLT):
\[$ \bar{x} \sim N \left( \mu, \frac{\sigma^2}{n} \right) $\]
So we can convert it to a standard normal variable $Z$:
\[$ Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} $\]
  1. The Problem: We don't know \(\sigma\). We substitute \(s\).
\[$ t = \frac{\bar{x} - \mu}{s / \sqrt{n}} $\]
  1. Analyze the Components: This \(t\) is actually a ratio of two random variables:

    • Numerator: A standard normal variable (\(Z\)).
    • Denominator: related to the Chi-squared distribution (since \(s^2\) is a sum of squared errors).

    Specifically, Gosset showed that:

\[$ t = \frac{Z}{\sqrt{V/k}} $\]
Where $Z$ is standard normal, $V$ is Chi-squared with $k$ degrees of freedom, and they are independent.
  1. The Result: The ratio of a Normal to the square root of a Chi-squared follows the Student's t-distribution. This distribution depends on \(k\) (sample size - 1). As \(k \to \infty\), the uncertainty about \(s\) vanishes, and the t-distribution becomes the Normal distribution.

Common Mistakes

  • Thinking "t" stands for "Test": It doesn't. Gosset originally called it \(z\), then changed it to \(t\). It's just a variable name.
  • Ignoring assumptions: The t-test assumes your data is normally distributed (or your sample is large enough for CLT to kick in). If your data is heavily skewed and \(n=5\), the t-test gives junk results.
  • The "p > 0.05" Trap: A high p-value doesn't prove the groups are the same. It just means you didn't have enough data to prove they are different.