Chi-Squared Test¶

The Question¶

You flip a coin 100 times. You get 60 heads and 40 tails. Is the coin biased? Or is that just normal fluctuation?

Or, you want to know if "User Preference" (Android vs iOS) depends on "Age Group" (Young vs Old). You have a grid of counts. How do you know if there's a pattern, or if the numbers just fell that way randomly?

The Chi-Squared (\(\chi^2\)) Test answers these questions by comparing what you observed vs. what you expected.

The Formula¶

\[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \]

Where: * \(O_i\) = Observed count (what actually happened) * \(E_i\) = Expected count (what should have happened if everything was boring/random)

The Story: Pearson's Standard¶

In 1900, Karl Pearson (the godfather of statistics) wanted a universal way to test if data fit a curve. Before him, people just looked at graphs and said, "Eh, looks close enough."

Pearson wanted a number. He realized that simply summing the errors (\(O - E\)) wouldn't work because positives and negatives would cancel out. He tried squaring them (\(O - E\))^2, but that gave too much weight to big numbers. An error of 5 is huge if you expected 10, but tiny if you expected 1,000,000.

So he divided by the expected value to normalize it. This created a metric that worked for any kind of categorical data. It was the first true "Goodness of Fit" test.

The Intuition¶

The Normalized Error¶

Think of the formula like this:

\[ \text{Contribution} = \frac{\text{Error}^2}{\text{Expected}} \]

Find the error: \((O - E)\).
Square it: Make it positive and punish big deviations more.
Scale it: Divide by \(E\) to put it in perspective.

If \(\chi^2\) is small, your observed data is very close to what you expected. The coin is fair. The variables are independent. If \(\chi^2\) is large, your observed data is wildly different from the expectation. The coin is rigged. The variables are linked.

Why "Chi-Squared"?¶

Because if you assume the errors are normally distributed (which they are, roughly, for large counts), then squaring a normal variable gives you a... Chi-squared variable. The sum of squared normal variables is literally the definition of the Chi-squared distribution.

Derivazione: Summing Standard Normals¶

Binomial Approximation: For a single cell (like "Heads"), the count \(O\) follows a Binomial distribution. For large \(n\), Binomial is approximately Normal:

\[ O \sim N(E, E) \]

(Variance of Poisson/Binomial count is roughly equal to the mean $E$).

Z-Score: If we want to standardize this error:

\[ Z = \frac{O - E}{\sigma} \approx \frac{O - E}{\sqrt{E}} \]

Square it:

\[ Z^2 = \frac{(O - E)^2}{E} \]

Sum them up: If we sum these \(Z^2\) values across all categories, we get the Chi-Squared statistic.

\[ \chi^2 = \sum Z^2 = \sum \frac{(O - E)^2}{E} \]

The degrees of freedom depend on how many categories you have minus the constraints (usually $k-1$).

Common Mistakes¶

Using Percentages: The Chi-Squared test only works on Counts (integers). Never plug in percentages or averages. If you have 50%, you need to know if that's 1 out of 2 (meaningless) or 500 out of 1000 (meaningful).
Small Sample Size: If your "Expected" count for any category is less than 5, the approximation breaks down. The math assumes a normal curve, but with counts like 1 or 2, that doesn't hold.
Causation: Like all correlations, a significant Chi-Squared test doesn't mean A causes B. It just means they aren't independent.

Chi-Squared Distribution — The curve we look up.
p-value — The probability of seeing this \(\chi^2\) by chance.
Degrees of Freedom — Crucial for reading the table.