Skip to content

Farkas' Lemma

The Statement

Given a matrix \(A\) of dimensions \(m \times n\) and a vector \(b \in \mathbb{R}^m\), exactly one of the following two systems has a solution:

  1. Primal System: \(Ax = b\) with \(x \geq 0\)
  2. Alternative System: \(A^T y \geq 0\) with \(b^T y < 0\)

What It Means

Farkas' Lemma is a "theorem of the alternative". It tells you that in certain mathematical scenarios, either this happens or that happens, but never both and never neither. It's like a switch: on or off.

Geometrically, it's even more intuitive. Imagine that the columns of matrix \(A\) are vectors starting from the origin. All their possible positive linear combinations (\(x \geq 0\)) form a cone. System 1 asks: "Is vector \(b\) inside this cone?"

If the answer is YES, then there exists an \(x \geq 0\) such that \(Ax = b\). If the answer is NO, then Farkas' Lemma guarantees that there exists a plane (a hyperplane) passing through the origin that separates \(b\) from the cone. This plane is defined by the normal vector \(y\) from system 2. System 2 essentially says: "Does there exist a direction \(y\) that makes an acute (or right) angle with all vectors of \(A\) (\(A^T y \geq 0\)), but an obtuse angle with \(b\) (\(b^T y < 0\))?"

Why It Works — The Underlying Geometry

The Power of Separation

Everything is based on the intuitive idea that if you have a convex object (like our cone) and a point outside it, you can always slip a sheet of paper (a hyperplane) between the two.

In 1902, Gyula Farkas, a Hungarian physicist and mathematician, formalized this intuition for systems of linear inequalities. At the time, "Linear Programming" wasn't discussed yet, but Farkas was studying analytical mechanics and system equilibrium, where constraints like "force cannot pull, only push" (non-negativity constraints) were common.

His insight was to understand that the impossibility of satisfying a set of linear constraints can always be "certified" by a counterexample (the vector \(y\)).

Application: Duality in Linear Programming

This is the "killer" application of Farkas' Lemma. It's the pillar upon which the Strong Duality Theorem rests.

In Linear Programming (LP), we have a primal problem (minimize costs) and a dual problem (maximize resource profit). Strong duality states that if one has an optimal solution, the other does too, and the objective function values coincide.

How is Farkas used here?

Imagine you want to prove that no better solution exists than your current optimum \(x^*\). This is a problem of "non-existence" of a solution (\(c^T x < c^T x^*\), \(Ax=b\), \(x \geq 0\)). Farkas' Lemma transforms this "non-existence" problem into an "existence" problem for the alternative system. And guess what? The alternative system corresponds exactly to the constraints of the dual problem!

In plain words: 1. Asking whether the primal can do better is equivalent to seeking a solution to a certain system. 2. If that system has no solution (i.e., the optimum is reached), Farkas tells us that a solution to the alternative system must exist. 3. That alternative solution is precisely the optimal solution to the Dual problem.

Without Farkas, duality would remain a beautiful observed symmetry, but not a rigorously proven theorem.

The Complete Proof

We will prove Farkas' Lemma in a completely self-contained manner, starting from basic principles. The proof requires first building the Separating Hyperplane Theorem, so we'll construct it from scratch.

Step 1: Basic Concepts of Convex Geometry

Definition (Convex Set): A set \(S \subseteq \mathbb{R}^m\) is convex if for every pair of points \(x, y \in S\) and every \(\lambda \in [0,1]\), we have \(\lambda x + (1-\lambda)y \in S\).

Intuition: The line segment connecting any two points in the set remains completely inside the set.

Definition (Cone): A set \(C \subseteq \mathbb{R}^m\) is a cone if for every \(x \in C\) and every \(\alpha \geq 0\), we have \(\alpha x \in C\).

Intuition: If a point is in the cone, all its positive multiples are too.

Definition (Convex Cone): A set that is both a cone and a convex set.

In our case, the set \(C = \{ Ax \mid x \geq 0 \}\) is a convex cone because: - If \(Ax_1 \in C\) and \(Ax_2 \in C\) (with \(x_1, x_2 \geq 0\)), then \(\lambda(Ax_1) + (1-\lambda)(Ax_2) = A(\lambda x_1 + (1-\lambda)x_2) \in C\) for \(\lambda \in [0,1]\) (convexity) - If \(Ax \in C\) (with \(x \geq 0\)), then \(\alpha(Ax) = A(\alpha x) \in C\) for \(\alpha \geq 0\) (cone property)

Step 2: Projection onto a Closed Convex Set

Theorem (Existence and Uniqueness of Projection): Given a closed convex set \(S \subseteq \mathbb{R}^m\) and a point \(b \notin S\), there exists a unique point \(p \in S\) that minimizes the distance from \(b\), that is: $\(p = \arg\min_{z \in S} \|z - b\|\)$

Proof: 1. Let \(d = \inf_{z \in S} \|z - b\|\) (the infimal distance). There exists a sequence \(\{z_k\} \subseteq S\) such that \(\|z_k - b\| \to d\). 2. We show that \(\{z_k\}\) is a Cauchy sequence. By the parallelogram law: $\(\|z_i + z_j - 2b\|^2 + \|z_i - z_j\|^2 = 2\|z_i - b\|^2 + 2\|z_j - b\|^2\)$ 3. By convexity, \((z_i + z_j)/2 \in S\), so \(\|(z_i + z_j)/2 - b\| \geq d\). 4. Substituting: \(4d^2 + \|z_i - z_j\|^2 \leq 2\|z_i - b\|^2 + 2\|z_j - b\|^2\) 5. As \(i, j \to \infty\), both terms on the right approach \(2d^2\), so \(\|z_i - z_j\| \to 0\). 6. Since \(\mathbb{R}^m\) is complete and \(S\) is closed, \(z_k \to p \in S\). 7. Uniqueness follows from the strict convexity of the Euclidean norm.

Step 3: Characterization of Projection

Lemma (Optimality Condition): Let \(p\) be the projection of \(b\) onto closed convex \(S\). Then for every \(z \in S\): $\((b - p)^T(z - p) \leq 0\)$

Intuition: The vector from \(p\) to \(b\) forms an obtuse (or right) angle with every vector going from \(p\) toward any other point in \(S\).

Proof: 1. By definition of \(p\), we have \(\|b - p\| \leq \|b - z\|\) for every \(z \in S\). 2. By convexity, \(p + t(z - p) \in S\) for every \(t \in [0,1]\). 3. Therefore \(\|b - p\|^2 \leq \|b - p - t(z - p)\|^2\) for every \(t \in [0,1]\). 4. Expanding the right side: $\(\|b - p\|^2 \leq \|b - p\|^2 - 2t(b - p)^T(z - p) + t^2\|z - p\|^2\)$ 5. Simplifying: \(0 \leq -2t(b - p)^T(z - p) + t^2\|z - p\|^2\) 6. Dividing by \(t > 0\): \(0 \leq -2(b - p)^T(z - p) + t\|z - p\|^2\) 7. As \(t \to 0^+\): \((b - p)^T(z - p) \leq 0\)

Step 4: The Separating Hyperplane Theorem

Theorem: Let \(S \subseteq \mathbb{R}^m\) be a closed convex set and let \(b \notin S\). Then there exists a vector \(y \neq 0\) such that: $\(y^T b < \inf_{z \in S} y^T z\)$

Intuition: We can find a plane that "cuts" between \(b\) and \(S\), leaving \(b\) on one side and all of \(S\) on the other.

Proof: 1. Let \(p\) be the projection of \(b\) onto \(S\) (exists by the theorem in Step 2). 2. Define \(y = b - p\). Note that \(y \neq 0\) because \(b \notin S\). 3. From the Lemma in Step 3, for every \(z \in S\): \((b - p)^T(z - p) \leq 0\) 4. Therefore: \(y^T(z - p) \leq 0\), i.e., \(y^T z \leq y^T p\) for every \(z \in S\). 5. We need to show that \(y^T b < y^T p\). 6. We compute: \(y^T b = (b - p)^T b = (b - p)^T (p + (b - p)) = (b - p)^T p + \|b - p\|^2\) 7. And: \(y^T p = (b - p)^T p\) 8. Therefore: \(y^T b = y^T p + \|b - p\|^2\) 9. Since \(\|b - p\|^2 > 0\) (given that \(b \neq p\)), we have \(y^T b > y^T p \geq y^T z\) for every \(z \in S\).

Oops! We obtained \(y^T b > y^T z\), but we wanted \(y^T b < y^T z\). We simply take \(-y\) instead of \(y\):

With \(y' = -(b - p) = p - b\), we obtain: $\(y'^T b < \inf_{z \in S} y'^T z\)$

Step 5: Proof of Farkas' Lemma

Now we have all the tools. Let's prove that exactly one of the two systems has a solution: 1. System 1: \(Ax = b\) with \(x \geq 0\) 2. System 2: \(A^T y \geq 0\) with \(b^T y < 0\)

Proof:

Part A: Both systems cannot have a solution simultaneously (mutual exclusivity)

Suppose by contradiction that both have solutions. Let \(x\) and \(y\) be solutions respectively. - From System 1: \(Ax = b\) with \(x \geq 0\) - From System 2: \(A^T y \geq 0\) with \(b^T y < 0\)

Multiply System 2 on the left by \(x^T\): $\(x^T A^T y \geq 0\)$

But \(x^T A^T y = (Ax)^T y = b^T y < 0\) (using System 1).

Contradiction! Therefore the two systems are mutually exclusive.

Part B: At least one of the two systems has a solution

Define \(C = \{ Ax \mid x \geq 0 \}\) = the set of all non-negative linear combinations of the columns of \(A\).

\(C\) is a convex cone (proven in Step 1) and closed (finite combination of rays).

Case 1: If \(b \in C\), then there exists \(x \geq 0\) such that \(Ax = b\). System 1 has a solution. ✓

Case 2: If \(b \notin C\), we apply the Separating Hyperplane Theorem (Step 4).

There exists \(y \neq 0\) such that: $\(y^T b < \inf_{z \in C} y^T z\)$

Now we must show that: 1. \(A^T y \geq 0\) 2. \(b^T y < 0\)

Proof of (2): \(C\) is a cone containing the origin (\(0 \in C\) because \(A \cdot 0 = 0\)).

Therefore \(\inf_{z \in C} y^T z \leq y^T 0 = 0\).

Combined with \(y^T b < \inf_{z \in C} y^T z\), we obtain: $\(y^T b < 0\)$

This is exactly condition (2)! ✓

For condition (1), observe that each column \(a_i\) of \(A\) belongs to \(C\) (by taking \(x\) with all components zero except the \(i\)-th equal to 1).

Therefore for each column \(a_i\): \(y^T a_i \geq \inf_{z \in C} y^T z\)

Moreover, since \(C\) is a cone, if \(z \in C\) then \(\alpha z \in C\) for every \(\alpha \geq 0\).

If there existed a column \(a_j\) with \(y^T a_j < 0\), then \(y^T(\alpha a_j) = \alpha(y^T a_j) \to -\infty\) as \(\alpha \to +\infty\), contradicting the fact that the infimum is finite.

Therefore \(y^T a_i \geq 0\) for each column \(a_i\), i.e., \(A^T y \geq 0\). ✓

Conclusion: We have proven that: - The two systems are mutually exclusive (Part A) - At least one of the two always has a solution (Part B)

Therefore exactly one of the two systems has a solution. ∎

Variables Explained

Symbol Description
\(A\) Matrix \(m \times n\) of constraint coefficients
\(x\) Column vector \(n \times 1\) of variables (unknowns) of the first system
\(b\) Column vector \(m \times 1\) of constant terms
\(y\) Column vector \(m \times 1\) of variables of the alternative system (often the "multipliers")
\(A^T\) The transpose of matrix \(A\)

Worked Example

Imagine you want to express the vector \(b = \begin{pmatrix} -1 \\ -1 \end{pmatrix}\) as a positive linear combination of vectors \(a_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}\) and \(a_2 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}\).

The cone generated by \(a_1\) and \(a_2\) is the entire first quadrant (everything that is positive). Clearly \(b\) (which is in the third quadrant) is not in there. Therefore System 1 (\(Ax=b, x \geq 0\)) has no solution.

By Farkas, there must exist a \(y\) for System 2 (\(A^T y \geq 0, b^T y < 0\)). Let's look for a vector \(y\) that makes an acute angle with \(a_1\) and \(a_2\), but an obtuse angle with \(b\).

Take \(y = \begin{pmatrix} 1 \\ 1 \end{pmatrix}\). Let's verify: 1. \(A^T y = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} 1 \\ 1 \end{pmatrix} \geq 0\). (Satisfied!) 2. \(b^T y = \begin{pmatrix} -1 & -1 \end{pmatrix} \begin{pmatrix} 1 \\ 1 \end{pmatrix} = -2 < 0\). (Satisfied!)

There we go. The existence of \(y=(1,1)\) is the "certificate" that \(b=(-1,-1)\) cannot be positively generated from \((1,0)\) and \((0,1)\).

History

  • 1902 — Gyula Farkas publishes the lemma. He was motivated by the study of mechanical equilibrium (principle of virtual work with unilateral constraints).
  • 1940s/50s — The lemma is rediscovered and becomes central with the advent of Linear Programming and the work of Dantzig and von Neumann. It's recognized as the fundamental tool for proving duality.

References

  • Farkas, G. (1902). Über die Theorie der einfachen Ungleichungen.
  • Bertsimas, D., & Tsitsiklis, J. N. Introduction to Linear Optimization.
  • Boyd, S., & Vandenberghe, L. Convex Optimization.