Moments¶

The Formula¶

The \(k\)-th raw moment (about the origin):

\[ \mu'_k = E[X^k] = \int_{-\infty}^{\infty} x^k f(x) \, dx \]

The \(k\)-th central moment (about the mean):

\[ \mu_k = E[(X - \mu)^k] = \int_{-\infty}^{\infty} (x - \mu)^k f(x) \, dx \]

What It Means¶

Moments are a set of "shape descriptors" for a probability distribution. Just as "width", "height", and "depth" describe a physical box, moments describe the shape of a data cloud.

1st Moment (Raw): The Mean (\(\mu\)). Where is the center of mass?
2nd Moment (Central): The Variance (\(\sigma^2\)). How wide is the spread?
3rd Moment (Standardized): Skewness. Is it lopsided to the left or right?
4th Moment (Standardized): Kurtosis. How fat are the tails (how likely are extreme outliers)?

Think of a distribution as a physical object. The moments tell you its mechanical properties.

Why It Works — The Intuition¶

The term "moment" comes from physics. * The 0th moment is the total mass (which is 1 for a probability distribution). * The 1st moment is the torque (force \(\times\) distance). Balancing the torques gives you the Center of Mass (the Mean). * The 2nd moment is the Moment of Inertia (mass \(\times\) distance\(^2\)). It tells you how hard it is to spin the object. This is exactly what Variance is: resistance to being "centered."

By calculating higher powers (\(x^3, x^4\)), we amplify the effect of points further from the center, allowing us to detect subtle asymmetries (skew) or extreme outliers (kurtosis).

Key Moments Explained¶

Order (\(k\))	Name	Formula (Standardized)	Meaning
1	Mean	\(\mu = E[X]\)	Location. The average value.
2	Variance	\(\sigma^2 = E[(X-\mu)^2]\)	Spread. The average squared distance.
3	Skewness	\(\gamma_1 = \frac{E[(X-\mu)^3]}{\sigma^3}\)	Symmetry. 0 = Symmetric. + = Right tail (long tail to positive). - = Left tail.
4	Kurtosis	\(\kappa = \frac{E[(X-\mu)^4]}{\sigma^4}\)	Tails. 3 = Normal Distribution. >3 = Fat tails (outlier prone). <3 = Thin tails.

Derivation (Moment Generating Function)¶

Moments can be derived from the Moment Generating Function (MGF):

\[ M_X(t) = E[e^{tX}] \]

The Taylor series expansion of \(e^{tX}\) is:

\[ e^{tX} = 1 + tX + \frac{(tX)^2}{2!} + \frac{(tX)^3}{3!} + \dots \]

Taking the expected value:

\[ M_X(t) = 1 + t E[X] + \frac{t^2}{2!} E[X^2] + \frac{t^3}{3!} E[X^3] + \dots \]

Therefore, the \(k\)-th raw moment is simply the \(k\)-th derivative of the MGF evaluated at \(t=0\):

\[ E[X^k] = M_X^{(k)}(0) \]

This is why it's called the "Moment Generating" function — it's a machine that spits out moments when you differentiate it.

Variables Explained¶

Symbol	Name	Description
\(\mu'_k\)	Raw Moment	Expected value of \(X^k\)
\(\mu_k\)	Central Moment	Expected value of \((X-\mu)^k\)
\(E[\cdot]\)	Expected Value	Probability-weighted average
\(f(x)\)	PDF	Probability Density Function

Common Mistakes¶

Confusing Raw and Central Moments:
- Raw moments are typically used for derivation and solving equations.
- Central moments are used for describing the shape (variance, skewness).
- \(\text{Variance} = \text{Raw Moment}_2 - (\text{Raw Moment}_1)^2\).
Excess Kurtosis: Standard kurtosis for a normal distribution is 3. Often, software reports Excess Kurtosis (Kurtosis - 3) so that a normal distribution is 0. Always check which one is being used.

Method of Moments — Using moments to estimate parameters.
Variance — The 2nd central moment.
Gaussian Distribution — Defined entirely by its first two moments.