Skip to content

Method of Moments

The Formula

The core idea is to equate the sample moments to the theoretical moments of the distribution.

  1. Calculate sample moments:
\[ m_k = \frac{1}{n} \sum_{i=1}^n x_i^k \]
  1. Calculate theoretical (population) moments as functions of the unknown parameters \(\theta\):
\[ \mu_k(\theta) = E[X^k] \]
  1. Set them equal and solve for \(\theta\):
\[ \mu_k(\theta) = m_k \quad \text{for } k = 1, \dots, p \]

What It Means

The Method of Moments (MoM) is one of the oldest ways to estimate the parameters of a probability distribution (like finding \(\mu\) and \(\sigma\) for a normal distribution, or \(\lambda\) for a Poisson distribution) based on observed data.

It follows a simple logic: "If the data comes from this distribution, the average of the data should match the theoretical average, and the variance of the data should match the theoretical variance."

It's like reverse-engineering: you look at the shape of the data (its moments) and adjust the knobs of your model until it matches that shape.

Why It Works — The Intuition

Suppose you have a mystery coin. You don't know the probability of heads (\(p\)). You flip it 100 times and get 60 heads. * The sample mean is \(0.60\). * The theoretical mean of a single flip (Bernoulli) is \(p\). * Method of Moments says: set \(p = 0.60\).

It seems obvious, right? But it works for complex distributions too. If a distribution has 2 unknown parameters (like mean and variance), you need 2 equations. You calculate the sample mean and sample variance, set them equal to the theoretical formulas, and solve the system of equations.

MoM estimators are usually consistent (they get the right answer with infinite data) but not always efficient (Maximum Likelihood Estimation, MLE, is often more precise). However, MoM is often much easier to calculate than MLE.

Derivation Example: Gamma Distribution

Let \(X \sim \text{Gamma}(\alpha, \beta)\). We want to estimate \(\alpha\) (shape) and \(\beta\) (scale).

  1. Theoretical Moments:

    • \(E[X] = \alpha \beta\)
    • \(\text{Var}(X) = \alpha \beta^2 \implies E[X^2] = \text{Var}(X) + (E[X])^2 = \alpha \beta^2 + \alpha^2 \beta^2\)
  2. Sample Moments:

    • \(m_1 = \bar{x}\) (Sample Mean)
    • \(m_2 = \frac{1}{n}\sum x_i^2\) (Sample Raw Moment)
    • (Alternatively, we can use sample variance \(s^2 = m_2 - m_1^2\)).
  3. Set Equal and Solve:

    • Equation 1: \(\bar{x} = \alpha \beta\)
    • Equation 2: \(s^2 = \alpha \beta^2\)

    From Eq 1: \(\beta = \bar{x} / \alpha\). Substitute into Eq 2:

\[ s^2 = \alpha (\frac{\bar{x}}{\alpha})^2 = \frac{\bar{x}^2}{\alpha} \]
\[ \alpha = \frac{\bar{x}^2}{s^2} \]
\[ \hat{\alpha} = \frac{\bar{x}^2}{s^2} \]
Now find $\beta$:
\[ \hat{\beta} = \frac{\bar{x}}{\hat{\alpha}} = \frac{\bar{x}}{(\bar{x}^2 / s^2)} = \frac{s^2}{\bar{x}} \]

So, just by knowing the mean and variance of your data, you can estimate the Gamma parameters instantly!

Variables Explained

Symbol Name Description
\(\theta\) Parameters The unknowns we want to find (e.g., \(\alpha, \beta, \lambda\))
\(m_k\) Sample Moment Calculated from data: \(\frac{1}{n} \sum x^k\)
\(\mu_k(\theta)\) Theoretical Moment Mathematical expectation \(E[X^k]\)
\(p\) Number of Parameters How many equations we need

Worked Example

Data: \([2, 4, 9]\). We assume an Exponential distribution (\(X \sim \text{Exp}(\lambda)\)). Theoretical mean: \(E[X] = \frac{1}{\lambda}\).

  1. Calculate Sample Mean (\(m_1\)):
\[ \bar{x} = \frac{2+4+9}{3} = 5 \]
  1. Set Equal to Theoretical Mean:
\[ \frac{1}{\lambda} = 5 \]
  1. Solve for \(\lambda\):
\[ \hat{\lambda} = \frac{1}{5} = 0.2 \]

Common Mistakes

  • Using too many moments: If you have 2 parameters, only use the first 2 moments. Using the 3rd adds unnecessary complexity and variance (higher moments are harder to estimate accurately).
  • Forgetting Sample Variance definition: When using MoM, it is standard to use the biased sample variance ($ rac{1}{n}$) to match the raw moments definition, though using $ rac{1}{n-1}$ is acceptable for large \(n\).
  • Impossible Estimates: Sometimes MoM can give values outside the allowed range (e.g., a probability \(> 1\) or negative variance). MLE handles this better.