Method of Moments¶

The Formula¶

The core idea is to equate the sample moments to the theoretical moments of the distribution.

Calculate sample moments:

\[ m_k = \frac{1}{n} \sum_{i=1}^n x_i^k \]

Calculate theoretical (population) moments as functions of the unknown parameters $\theta$:

\[ \mu_k(\theta) = E[X^k] \]

Set them equal and solve for $\theta$:

\[ \mu_k(\theta) = m_k \quad \text{for } k = 1, \dots, p \]

What It Means¶

The Method of Moments (MoM) is one of the oldest ways to estimate the parameters of a probability distribution (like finding $\mu$ and $\sigma$ for a normal distribution, or $\lambda$ for a Poisson distribution) based on observed data.

It follows a simple logic: "If the data comes from this distribution, the average of the data should match the theoretical average, and the variance of the data should match the theoretical variance."

It's like reverse-engineering: you look at the shape of the data (its moments) and adjust the knobs of your model until it matches that shape.

Why It Works — The Intuition¶

Suppose you have a mystery coin. You don't know the probability of heads ($p$). You flip it 100 times and get 60 heads. * The sample mean is $0.60$. * The theoretical mean of a single flip (Bernoulli) is $p$. * Method of Moments says: set $p = 0.60$.

It seems obvious, right? But it works for complex distributions too. If a distribution has 2 unknown parameters (like mean and variance), you need 2 equations. You calculate the sample mean and sample variance, set them equal to the theoretical formulas, and solve the system of equations.

MoM estimators are usually consistent (they get the right answer with infinite data) but not always efficient (Maximum Likelihood Estimation, MLE, is often more precise). However, MoM is often much easier to calculate than MLE.

Derivation Example: Gamma Distribution¶

Let $X \sim \text{Gamma}(\alpha, \beta)$. We want to estimate $\alpha$ (shape) and $\beta$ (scale).

Theoretical Moments:
- $E[X] = \alpha \beta$
- $\text{Var}(X) = \alpha \beta^2 \implies E[X^2] = \text{Var}(X) + (E[X])^2 = \alpha \beta^2 + \alpha^2 \beta^2$
Sample Moments:
- $m_1 = \bar{x}$ (Sample Mean)
- $m_2 = \frac{1}{n}\sum x_i^2$ (Sample Raw Moment)
- (Alternatively, we can use sample variance $s^2 = m_2 - m_1^2$).
Set Equal and Solve:
- Equation 1: $\bar{x} = \alpha \beta$
- Equation 2: $s^2 = \alpha \beta^2$
From Eq 1: $\beta = \bar{x} / \alpha$. Substitute into Eq 2:

\[ s^2 = \alpha (\frac{\bar{x}}{\alpha})^2 = \frac{\bar{x}^2}{\alpha} \]

\[ \alpha = \frac{\bar{x}^2}{s^2} \]

\[ \hat{\alpha} = \frac{\bar{x}^2}{s^2} \]

Now find $\beta$:

\[ \hat{\beta} = \frac{\bar{x}}{\hat{\alpha}} = \frac{\bar{x}}{(\bar{x}^2 / s^2)} = \frac{s^2}{\bar{x}} \]

So, just by knowing the mean and variance of your data, you can estimate the Gamma parameters instantly!

Variables Explained¶

Symbol	Name	Description
$\theta$	Parameters	The unknowns we want to find (e.g., $\alpha, \beta, \lambda$)
$m_k$	Sample Moment	Calculated from data: $\frac{1}{n} \sum x^k$
$\mu_k(\theta)$	Theoretical Moment	Mathematical expectation $E[X^k]$
$p$	Number of Parameters	How many equations we need

Worked Example¶

Data: $[2, 4, 9]$. We assume an Exponential distribution ($X \sim \text{Exp}(\lambda)$). Theoretical mean: $E[X] = \frac{1}{\lambda}$.

Calculate Sample Mean ($m_1$):

\[ \bar{x} = \frac{2+4+9}{3} = 5 \]

Set Equal to Theoretical Mean:

\[ \frac{1}{\lambda} = 5 \]

Solve for $\lambda$:

\[ \hat{\lambda} = \frac{1}{5} = 0.2 \]

Common Mistakes¶

Using too many moments: If you have 2 parameters, only use the first 2 moments. Using the 3rd adds unnecessary complexity and variance (higher moments are harder to estimate accurately).
Forgetting Sample Variance definition: When using MoM, it is standard to use the biased sample variance ($ rac{1}{n}$) to match the raw moments definition, though using $ rac{1}{n-1}$ is acceptable for large $n$.
Impossible Estimates: Sometimes MoM can give values outside the allowed range (e.g., a probability $> 1$ or negative variance). MLE handles this better.

Moments — The building blocks of this method.
Maximum Likelihood Estimation — The "competitor" method, often more precise but harder to compute.
OLS Estimators — In regression, OLS is actually a Method of Moments estimator!

Symbol	Name	Description
\(\theta\)	Parameters	The unknowns we want to find (e.g., \(\alpha, \beta, \lambda\))
\(m_k\)	Sample Moment	Calculated from data: \(\frac{1}{n} \sum x^k\)
\(\mu_k(\theta)\)	Theoretical Moment	Mathematical expectation \(E[X^k]\)
\(p\)	Number of Parameters	How many equations we need