Covariance¶
The Formula¶
For a sample:
For a population:
What It Means¶
Covariance measures the direction of the linear relationship between two variables.
- Positive Covariance: When \(x\) is high, \(y\) tends to be high. When \(x\) is low, \(y\) tends to be low. They move together.
- Negative Covariance: When \(x\) is high, \(y\) tends to be low. They move in opposite directions.
- Zero Covariance: There is no linear pattern connecting the movements of \(x\) and \(y\).
Unlike correlation, covariance is unscaled. If you double the values of \(x\) (e.g., measure in centimeters instead of meters), the covariance increases, even though the relationship strength is the same. This makes it hard to compare covariances across different datasets.
Why It Works — The Intuition¶
Imagine drawing a crosshair at the mean of your data \((\bar{x}, \bar{y})\). This divides your scatter plot into four quadrants.
- Top-Right (\(x > \bar{x}, y > \bar{y}\)): Both deviations are positive. Product is \((+) \cdot (+) = +\).
- Bottom-Left (\(x < \bar{x}, y < \bar{y}\)): Both deviations are negative. Product is \((-) \cdot (-) = +\).
- Top-Left (\(x < \bar{x}, y > \bar{y}\)): \(x\) deviation is \((-)\), \(y\) is \((+)\). Product is \(-\).
- Bottom-Right (\(x > \bar{x}, y < \bar{y}\)): \(x\) deviation is \((+)\), \(y\) is \((-)\). Product is \(-\).
Covariance is just the average of these products. * If most points are in the Top-Right and Bottom-Left, the positive products dominate \(\to\) Positive Covariance. * If most points are in the Top-Left and Bottom-Right, the negative products dominate \(\to\) Negative Covariance.
Derivation¶
Covariance comes from the definition of Expected Value (\(E\)). The variance of a single variable \(X\) is the expected squared deviation from the mean:
Covariance simply generalizes this to two variables:
Expanding this expectation: 1. Expand terms:
- Use linearity of expectation (\(E[A + B] = E[A] + E[B]\) and \(E[cX] = cE[X]\)):
- Substitute \(E[X] = \mu_x\) and \(E[Y] = \mu_y\):
- Simplify:
This alternate formula is computationally useful: "Mean of the product minus product of the means."
Variables Explained¶
| Symbol | Name | Description |
|---|---|---|
| \(\text{Cov}(x,y)\) | Sample Covariance | The measure of joint variability from a sample |
| \(\sigma_{xy}\) | Population Covariance | The theoretical measure for the whole population |
| \(x_i, y_i\) | Data Points | Individual observations |
| \(\bar{x}, \bar{y}\) | Sample Means | Average of \(x\) and \(y\) samples |
| \(\mu_x, \mu_y\) | Population Means | Theoretical average of \(X\) and \(Y\) |
| \(n\) | Sample Size | Number of data pairs |
| \(E[\cdot]\) | Expected Value | The probability-weighted average |
Worked Example¶
Data: \(x = [1, 2, 3]\), \(y = [2, 4, 6]\). Means: \(\bar{x} = 2\), \(\bar{y} = 4\).
-
Calculate Deviations:
- \((1-2, 2-4) = (-1, -2)\)
- \((2-2, 4-4) = (0, 0)\)
- \((3-2, 6-4) = (1, 2)\)
-
Multiply Deviations:
- \((-1)(-2) = 2\)
- \((0)(0) = 0\)
- \((1)(2) = 2\)
-
Sum and Divide:
- Sum = \(2 + 0 + 2 = 4\)
- \(\text{Cov}(x,y) = \frac{4}{3-1} = \frac{4}{2} = 2\)
The covariance is 2. The positive sign tells us they move together. The magnitude (2) depends on the units (e.g., if \(y\) was doubled to \([4, 8, 12]\), covariance would become 4).
Common Mistakes¶
- Interpreting the Magnitude: A covariance of 500 isn't necessarily "stronger" than a covariance of 0.5. It depends on the units. Always normalize to Correlation (\(r\)) to judge strength.
- Confusing \(n\) and \(n-1\): For samples, divide by \(n-1\) (Bessel's correction) to get an unbiased estimator. For populations, divide by \(N\).
Related Formulas¶
- Pearson's Correlation — The normalized version of covariance (\(\frac{\text{Cov}}{s_x s_y}\)).
- Variance — The covariance of a variable with itself (\(\text{Cov}(X,X)\)).