Skip to content

Moments

The Formula

The \(k\)-th raw moment (about the origin):

\[ \mu'_k = E[X^k] = \int_{-\infty}^{\infty} x^k f(x) \, dx \]

The \(k\)-th central moment (about the mean):

\[ \mu_k = E[(X - \mu)^k] = \int_{-\infty}^{\infty} (x - \mu)^k f(x) \, dx \]

What It Means

Moments are a set of "shape descriptors" for a probability distribution. Just as "width", "height", and "depth" describe a physical box, moments describe the shape of a data cloud.

  • 1st Moment (Raw): The Mean (\(\mu\)). Where is the center of mass?
  • 2nd Moment (Central): The Variance (\(\sigma^2\)). How wide is the spread?
  • 3rd Moment (Standardized): Skewness. Is it lopsided to the left or right?
  • 4th Moment (Standardized): Kurtosis. How fat are the tails (how likely are extreme outliers)?

Think of a distribution as a physical object. The moments tell you its mechanical properties.

Why It Works — The Intuition

The term "moment" comes from physics. * The 0th moment is the total mass (which is 1 for a probability distribution). * The 1st moment is the torque (force \(\times\) distance). Balancing the torques gives you the Center of Mass (the Mean). * The 2nd moment is the Moment of Inertia (mass \(\times\) distance\(^2\)). It tells you how hard it is to spin the object. This is exactly what Variance is: resistance to being "centered."

By calculating higher powers (\(x^3, x^4\)), we amplify the effect of points further from the center, allowing us to detect subtle asymmetries (skew) or extreme outliers (kurtosis).

Key Moments Explained

Order (\(k\)) Name Formula (Standardized) Meaning
1 Mean \(\mu = E[X]\) Location. The average value.
2 Variance \(\sigma^2 = E[(X-\mu)^2]\) Spread. The average squared distance.
3 Skewness \(\gamma_1 = \frac{E[(X-\mu)^3]}{\sigma^3}\) Symmetry.
0 = Symmetric.
+ = Right tail (long tail to positive).
- = Left tail.
4 Kurtosis \(\kappa = \frac{E[(X-\mu)^4]}{\sigma^4}\) Tails.
3 = Normal Distribution.
>3 = Fat tails (outlier prone).
<3 = Thin tails.

Derivation (Moment Generating Function)

Moments can be derived from the Moment Generating Function (MGF):

\[ M_X(t) = E[e^{tX}] \]

The Taylor series expansion of \(e^{tX}\) is:

\[ e^{tX} = 1 + tX + \frac{(tX)^2}{2!} + \frac{(tX)^3}{3!} + \dots \]

Taking the expected value:

\[ M_X(t) = 1 + t E[X] + \frac{t^2}{2!} E[X^2] + \frac{t^3}{3!} E[X^3] + \dots \]

Therefore, the \(k\)-th raw moment is simply the \(k\)-th derivative of the MGF evaluated at \(t=0\):

\[ E[X^k] = M_X^{(k)}(0) \]

This is why it's called the "Moment Generating" function — it's a machine that spits out moments when you differentiate it.

Variables Explained

Symbol Name Description
\(\mu'_k\) Raw Moment Expected value of \(X^k\)
\(\mu_k\) Central Moment Expected value of \((X-\mu)^k\)
\(E[\cdot]\) Expected Value Probability-weighted average
\(f(x)\) PDF Probability Density Function

Common Mistakes

  • Confusing Raw and Central Moments:
    • Raw moments are typically used for derivation and solving equations.
    • Central moments are used for describing the shape (variance, skewness).
    • \(\text{Variance} = \text{Raw Moment}_2 - (\text{Raw Moment}_1)^2\).
  • Excess Kurtosis: Standard kurtosis for a normal distribution is 3. Often, software reports Excess Kurtosis (Kurtosis - 3) so that a normal distribution is 0. Always check which one is being used.