Reduction, Completeness, and the Cook-Levin Theorem
Introduction
This lecture delves into the fundamental concepts of reduction and completeness within the realm of complexity theory, with a particular focus on the complexity classes P and NP. We will investigate problems that are considered complete for these classes, beginning with a P-complete problem and then moving on to an NP-complete problem. To aid our exploration, we will introduce Boolean circuits as a powerful alternative for representing Boolean formulas.
Our journey will encompass the evaluation of Boolean formulas and circuits, setting the stage for introducing the table method. This ingenious method provides a mechanism for mapping a sequence of Turing machine configurations to a corresponding Boolean circuit. The table method will play a pivotal role in proving the celebrated Cook-Levin theorem, a cornerstone result that has profoundly shaped our understanding of computational complexity.
Key Objectives:
Grasp the core concepts of reduction and completeness in the context of complexity theory.
Familiarize ourselves with Boolean circuits and establish their equivalence to Boolean formulas and Boolean functions.
Explore the intricacies of the table method and its application in transforming Turing machine computations into Boolean circuits.
Understand the significance of the Cook-Levin theorem and its implications for P-completeness and NP-completeness.
Prove the Cook-Levin theorem, demonstrating the P-completeness of CircuitVal and the NP-completeness of CircuitSat.
Reduction is a technique for proving that one problem is at least as hard as another. In the context of complexity theory, a reduction from problem A to problem B shows that if we have an efficient algorithm for problem B, we can use it to create an efficient algorithm for problem A. Formally, a language \(L\) is reducible to a language \(L'\) if there exists a function \(f\) computable in logarithmic space such that for all inputs \(x\), \(x \in L\) if and only if \(f(x) \in L'\).
A problem is complete for a complexity class if it is a member of that class and every other problem in that class can be reduced to it. Intuitively, complete problems are the "hardest" problems in their respective complexity classes.
Outline:
Boolean Formulas and Circuits: We will begin by defining Boolean formulas and circuits, exploring their evaluation, and establishing their equivalence to Boolean functions.
The Table Method: We will introduce the table method, a technique for mapping Turing machine computations to Boolean circuits.
Cook-Levin Theorem: We will present and prove the Cook-Levin theorem, demonstrating the P-completeness of CircuitVal and the NP-completeness of CircuitSat.
Implications and Applications: We will discuss the implications of the Cook-Levin theorem and explore its use in proving the completeness of other problems.
Boolean Formulas and Boolean Circuits
This section introduces the fundamental concepts of Boolean formulas and Boolean circuits, highlighting their definitions, evaluation, and equivalence.
A Boolean formula is a formula composed of Boolean variables and Boolean operators such as negation (\(\neg\)), disjunction (\(\vee\)), and conjunction (\(\wedge\)). A Boolean variable can take a value of either 0 (false) or 1 (true).
A Boolean circuit is a directed acyclic graph (DAG) where each node (gate) has a type:
Variable nodes: Represent Boolean variables.
Constant nodes: Represent the constants 0 or 1.
Operator nodes: Represent Boolean operators (\(\neg\), \(\vee\), \(\wedge\)).
Input nodes (variable and constant nodes) have no incoming edges. Negation nodes have one incoming edge, while disjunction and conjunction nodes have two incoming edges. Typically, there is one output node with no outgoing edges, although this requirement can be relaxed.
Consider a Boolean circuit with variable nodes \(X_1\), \(X_2\), \(X_3\) and a constant node 1. The circuit can represent the formula: \[(\neg(X_1 \wedge X_2)) \wedge ((X_1 \wedge X_2) \vee X_3) \wedge ((X_1 \wedge X_2) \vee 1)\]
Remark. Remark 1. Boolean circuits can provide a compact representation of Boolean formulas, especially when patterns are repeated within the formula. As seen in Example [ex:boolean_circuit], the subexpression \((X_1 \wedge X_2)\) is represented only once in the circuit but used multiple times in the formula.
Evaluation of Boolean Circuits
Given an assignment \(\phi\) that maps each input node to either 0 or 1, we can evaluate a Boolean circuit \(C\) by propagating the truth values through the gates according to the defined operators. We denote the evaluation of circuit \(C\) under assignment \(\phi\) as \(C(\phi)\). For instance, if we assign \(X_1 = 0\), \(X_2 = 0\), and \(X_3 = 1\) in Example [ex:boolean_circuit], the circuit evaluates to 1.
Equivalence of Boolean Formulas, Circuits, and Functions
Boolean formulas, Boolean circuits, and Boolean functions are different but equivalent ways to represent the same underlying logical relationships. They all possess the same expressive power, meaning any logical relationship expressible in one form can also be expressed in the others.
Boolean formulas, Boolean circuits, and Boolean functions are equivalent in terms of their expressive power. This means that any logical relationship that can be expressed in one form can also be expressed in the others.
Proof. Proof. We will demonstrate the equivalence by showing how to convert between these representations:
Boolean formula to Boolean circuit: This conversion is straightforward. Each operator in the formula corresponds to a gate in the circuit, and each variable corresponds to an input node. Example [ex:boolean_circuit] demonstrates this conversion.
Boolean circuit to Boolean formula: We can traverse the circuit from the output node to the input nodes, constructing the formula recursively. Each gate’s output corresponds to a subformula.
Boolean function to Boolean formula: Given a Boolean function \(f: \{0, 1\}^n \to \{0, 1\}\), we can create a Boolean formula that is satisfied if and only if the function evaluates to 1. For each input tuple \((a_1, a_2, \dots, a_n)\) where \(f(a_1, a_2, \dots, a_n) = 1\), we create a conjunction term \(C_i = (L_{i,1} \wedge L_{i,2} \wedge \dots \wedge L_{i,n})\), where \(L_{i,j} = X_j\) if \(a_j = 1\) and \(L_{i,j} = \neg X_j\) if \(a_j = 0\). The final formula is the disjunction of all such conjunction terms: \(F = C_1 \vee C_2 \vee \dots \vee C_m\), where \(m\) is the number of input tuples that evaluate to 1.
Boolean formula to Boolean function: Given a Boolean formula, we can evaluate it for each possible input tuple to determine the corresponding output of the Boolean function.
◻
Given a set of \(n\) Boolean variables, a Boolean function \(f\) is a mapping \(f: \{0, 1\}^n \to \{0, 1\}\).
Consider a Boolean function \(f: \{0, 1\}^2 \to \{0, 1\}\) defined as:
\(f(0, 0) = 0\)
\(f(0, 1) = 1\)
\(f(1, 0) = 1\)
\(f(1, 1) = 0\)
This function can be represented by the Boolean formula: \[(\neg X_1 \wedge X_2) \vee (X_1 \wedge \neg X_2)\]
This formula is satisfied only when the input is (0, 1) or (1, 0), which corresponds exactly to the cases where the function \(f\) evaluates to 1. This demonstrates how a Boolean function can be converted into an equivalent Boolean formula.
The Table Method
The table method is a powerful technique for transforming a sequence of configurations of a Turing machine on a given input into a corresponding Boolean circuit. This transformation is crucial for proving the Cook-Levin theorem, as it allows us to reduce the problem of deciding a language in P or NP to the problem of evaluating or satisfying a Boolean circuit.
Constructing the Table
Let \(L\) be a language in P. By definition, there exists a deterministic Turing machine \(M\) that decides \(L\) in time \(n^h\) for some constant \(h\). Without loss of generality, we can assume \(M\) is a single-tape Turing machine. This is justified because any multi-tape Turing machine can be simulated by a single-tape Turing machine with at most a quadratic slowdown, which does not affect the polynomial time bound.
Given an input \(X\) of length \(n\), we construct a table \(T_{M,X}\) that represents the computation of \(M\) on \(X\). Each row of the table represents a configuration of the Turing machine at a specific time step, and each column represents a cell on the Turing machine’s tape.
Table Structure:
The table \(T_{M,X}\) has \(n^h\) rows, where each row corresponds to one time step of the computation.
Each row has \(n^h + 2\) cells. Since the Turing machine can move at most one cell per time step, \(n^h\) cells are sufficient to capture the portion of the tape that can be reached within \(n^h\) steps. The extra two cells represent the boundaries.
Each cell contains either a symbol from the tape alphabet \(\Sigma\) or a pair (symbol, state) indicating the head position and the current state of the Turing machine. We denote the content of cell \((i, j)\) as \(T_{M,X}[i, j]\).
Table Initialization and Termination:
The first row represents the initial configuration of the Turing machine with input \(X = X_1X_2...X_n\). The head is positioned at the first symbol, and the machine is in the starting state \(s\). The first row is thus initialized as \((X_1, s), X_2, \dots, X_n, \square, \dots, \square\), where \(\square\) represents a blank symbol.
The last row represents the final configuration of the Turing machine after \(n^h\) steps. We assume that the machine returns to the beginning of the tape before halting and enters either the accepting state "yes" or the rejecting state "no".
Observations about the Table
First Column: The first column (except for the last row) is known and fixed. Since the machine starts at the leftmost position and cannot move left from there, the first cell always contains either the initial symbol or the pair (initial symbol, state) for all rows except the last.
First Row: The first row is known and determined by the input \(X\) and the initial state of the Turing machine.
Last Column: The last column is known and always contains blank symbols, as the machine cannot reach beyond \(n^h\) cells in \(n^h\) steps.
Local Dependency: The content of cell \((i, j)\), denoted by \(T_{M,X}[i, j]\), can be computed from the contents of the three cells above it in the previous row: \(T_{M,X}[i-1, j-1]\), \(T_{M,X}[i-1, j]\), and \(T_{M,X}[i-1, j+1]\). This is because the content of a cell at time step \(i\) is determined solely by the contents of the cells in its immediate neighborhood at time step \(i-1\) and the transition function of the Turing machine.
Uniform Computation Rule: The rule for computing the content of cell \((i, j)\) is independent of the specific values of \(i\) and \(j\). This rule is determined by the transition function \(\delta\) of the Turing machine \(M\). In other words, the same rule applies to any cell in the table, regardless of its position.
The figure above illustrates the local dependency in the table. The content of cell \((i, j)\) (highlighted in yellow) depends on the contents of cells \((i-1, j-1)\), \((i-1, j)\), and \((i-1, j+1)\). In this example, if the cells \((i-1, j-1)\), \((i-1, j)\), and \((i-1, j+1)\) contain \(\sigma_1\), \((q, \sigma_2)\), and \(\sigma_3\) respectively, and the Turing machine transitions from state \(q\) to \(q'\) while moving right and writing \(\sigma_2\), then cell \((i, j)\) will contain \((q', \sigma_2)\). This demonstrates how the transition function of the Turing machine dictates the contents of each cell based on its neighborhood in the previous row.
Proof of the Cook-Levin Theorem
The Cook-Levin theorem is a landmark result in complexity theory, establishing the foundations for understanding NP-completeness. It states that the Circuit Value problem (CircuitVal) is P-complete and the Circuit Satisfiability problem (CircuitSat) is NP-complete.
The Circuit Value problem (CircuitVal) is the problem of determining the truth value of a Boolean circuit without variables. The language CircuitVal consists of all Boolean circuits without variables that evaluate to 1.
The Circuit Satisfiability problem (CircuitSat) is the problem of determining whether there exists an assignment to the variables of a Boolean circuit that makes the circuit evaluate to 1.
Proof Outline
The proof proceeds in two main parts:
We will demonstrate that any language \(L\) in P can be reduced to CircuitVal in logarithmic space. This establishes the P-completeness of CircuitVal.
We will then demonstrate that any language \(L\) in NP can be reduced to CircuitSat in logarithmic space. This establishes the NP-completeness of CircuitSat.
Reducing \(L \in \text{P}\) to CircuitVal
Let \(L\) be an arbitrary language in P. By the definition of P, there exists a deterministic Turing machine \(M\) that decides \(L\) in polynomial time \(n^h\) for some constant \(h\). Our goal is to construct a reduction from \(L\) to CircuitVal.
Table Construction: Given an input \(X\) of length \(n\), we construct the table \(T_{M,X}\) as described in Section 3. This table represents the computation of \(M\) on \(X\).
Binary Encoding: We encode each cell of the table \(T_{M,X}\) in binary using a fixed-length encoding. Since the tape alphabet \(\Sigma\) and the set of states \(Q\) are finite, we can represent each symbol or (symbol, state) pair using a fixed number of bits, say \(k\). Each cell in the table will then be represented by a binary string of length \(k\).
Local Circuit: The transition function \(\delta\) of \(M\) defines the rules for updating the contents of the tape. We can construct a Boolean circuit \(C_M\) that captures these rules. \(C_M\) takes as input the binary encodings of three consecutive cells in row \(i-1\) (3k bits) and outputs the binary encoding of the corresponding cell in row \(i\) (k bits). Importantly, \(C_M\) has a constant size that depends only on the transition function \(\delta\) and not on the input length \(n\).
- Example: If the transition function dictates that when the machine is in state \(q\) reading symbol \(\sigma_2\), it transitions to state \(q'\), writes \(\sigma_3\), and moves right, then the corresponding part of \(C_M\) will take the binary encodings of \(\sigma_1\), \((q, \sigma_2)\), and \(\sigma_3\) as input and produce the binary encoding of \((q', \sigma_3)\) as output.
Global Circuit: The table \(T_{M,X}\) can be viewed as a large Boolean circuit \(C_{M,X}\) composed of multiple copies of \(C_M\). Each cell in the table corresponds to one instance of \(C_M\), with its inputs connected to the outputs of the corresponding instances in the previous row. Since the table has \(n^h\) rows and \(n^h + 2\) columns, \(C_{M,X}\) consists of approximately \(n^{2h}\) copies of \(C_M\).
No Variables: \(C_{M,X}\) is a circuit without variables. The first row is fixed by the input \(X\), the first column is fixed by the initial state and the fact that the machine cannot move left from the starting position, and the last column is fixed and always contains blank symbols. All other cells are computed deterministically using the copies of \(C_M\).
Acceptance Condition: \(X \in L\) if and only if \(M\) accepts \(X\) within \(n^h\) steps. This happens if and only if the first bit of the cell in the last row and first column of \(T_{M,X}\) is 1 (assuming the accepting configuration is encoded such that the first bit of the "yes" state is 1).
Equivalence to CircuitVal: The condition that the first bit of the cell in the last row and first column of \(T_{M,X}\) is 1 is equivalent to saying that \(C_{M,X}\) evaluates to 1. The output of \(C_{M,X}\) can be taken as the output of the \(C_M\) instance corresponding to the cell in the last row and first column, and we can consider only the first bit of this output.
Reduction: Therefore, we have a reduction \(f(X) = C_{M,X}\) from \(L\) to CircuitVal. Given \(X\), we can construct \(C_{M,X}\) and then \(X \in L\) if and only if \(C_{M,X} \in\) CircuitVal.
Logarithmic Space: The reduction can be computed in logarithmic space. To construct \(C_{M,X}\), we only need to iterate through the cells of the table and output the corresponding \(C_M\) instances with appropriate connections. We only need to store the indices of the current cell being computed, which requires \(O(\log n)\) space since each index ranges from 1 to \(n^h\). We also need to remember the transition function \(\delta\) of \(M\), but this has constant size.
The Circuit Value problem (CircuitVal) is P-complete.
Proof. Proof. We have shown that any language \(L\) in P can be reduced to CircuitVal in logarithmic space. To complete the proof of P-completeness, we need to show that CircuitVal is in P. This is straightforward: given a Boolean circuit without variables, we can evaluate it in polynomial time by propagating the truth values from the inputs to the output. ◻
Reducing \(L \in \text{NP}\) to CircuitSat
Let \(L\) be an arbitrary language in NP. By the definition of NP, there exists a non-deterministic Turing machine \(N\) that decides \(L\) in polynomial time \(n^h\) for some constant \(h\).
Table Construction: The construction of the table \(T_{N,X}\) is similar to the deterministic case. Each row represents a configuration of \(N\) on input \(X\) at a specific time step.
Non-deterministic Choices: However, the value of a cell in row \(i\) now depends not only on the three cells above it in row \(i-1\) but also on the non-deterministic choice made by \(N\) at step \(i-1\).
Choice Variables: We introduce a Boolean variable \(D_{i-1}\) for each non-deterministic choice made at step \(i-1\). \(D_{i-1}\) ranges from 1 to \(d\), where \(d\) is the degree of non-determinism of \(N\) (the maximum number of choices \(N\) can make in any given state).
Binary Encoding: If \(N\) has a degree of non-determinism of 2 (as is often assumed for simplicity), each \(D_i\) is a single bit (0 or 1). In general, we can encode each \(D_i\) using \(\lceil \log d \rceil\) bits.
Circuit with Variables: The resulting circuit \(C_{N,X}\) now has variables \(D_0, D_1, \dots, D_{n^h-1}\). These variables represent the non-deterministic choices made by \(N\) during the computation.
- Example: If at step \(i-1\) the machine can either transition to state \(q_1\) or \(q_2\), we introduce a variable \(D_{i-1}\). If \(D_{i-1} = 0\), the circuit simulates the transition to \(q_1\); if \(D_{i-1} = 1\), it simulates the transition to \(q_2\).
Acceptance Condition: \(X \in L\) if and only if there exists at least one accepting computation of \(N\) on \(X\) within \(n^h\) steps.
Equivalence to CircuitSat: This is equivalent to the existence of an assignment to the variables \(D_i\) that makes \(C_{N,X}\) evaluate to 1. The output of \(C_{N,X}\) is again taken as the first bit of the output of the \(C_M\) instance corresponding to the cell in the last row and first column.
Reduction: Therefore, we have a reduction \(f(X) = C_{N,X}\) from \(L\) to CircuitSat. Given \(X\), we can construct \(C_{N,X}\), and then \(X \in L\) if and only if \(C_{N,X} \in\) CircuitSat.
Logarithmic Space: The reduction can be computed in logarithmic space, similar to the deterministic case. We only need to store the indices of the current cell and the current non-deterministic choice variable, which requires \(O(\log n)\) space.
The Circuit Satisfiability problem (CircuitSat) is NP-complete.
Proof. Proof. We have shown that any language \(L\) in NP can be reduced to CircuitSat in logarithmic space. To complete the proof of NP-completeness, we need to show that CircuitSat is in NP. This is also straightforward: given a Boolean circuit and an assignment to its variables, we can evaluate the circuit in polynomial time and check if it outputs 1. A non-deterministic Turing machine can guess an assignment and then verify it in polynomial time. ◻
Conclusion
In this lecture, we delved into the fundamental concepts of reduction and completeness, with a particular focus on the complexity classes P and NP. We introduced Boolean circuits as an alternative and often more compact representation of Boolean formulas, and we established their equivalence to both Boolean formulas and Boolean functions (Proposition [prop:equivalence]). We then presented the table method, a powerful technique for transforming Turing machine computations into Boolean circuits. Leveraging this method, we proved the Cook-Levin theorem (Theorems [thm:cook_levin_p] and [thm:cook_levin_np]), a cornerstone result that establishes the P-completeness of the Circuit Value problem (CircuitVal) and the NP-completeness of the Circuit Satisfiability problem (CircuitSat).
The Cook-Levin theorem is of paramount importance in complexity theory. It provides a bedrock for proving the completeness of other problems in P and NP. By demonstrating a reduction from CircuitVal or CircuitSat to a given problem, we can establish its completeness within the respective complexity class. This significantly simplifies the process of establishing completeness, a concept we will explore further in subsequent lectures.
How can we use the Cook-Levin theorem as a tool to prove that other problems are NP-complete? (Hint: Consider the concept of reduction.)
What are the implications of the P vs. NP problem in the context of the Cook-Levin theorem? (Hint: What would it mean if \(P = NP\), given that \(\text{CircuitSAT}\) is NP-complete?)
Can we extend the table method to other complexity classes beyond P and NP? (Hint: Think about how the table is constructed and what properties of the complexity class are crucial.)
What is the significance of the logarithmic space requirement for the reduction in the proof of the Cook-Levin theorem? (Hint: Consider the implications for the complexity of the reduction itself and how it relates to the complexity of the original problem.)
Reduction: A reduction from problem (A) to problem (B) is a transformation that allows us to solve problem (A) using an algorithm for problem (B). In the context of complexity theory, a log-space reduction from language (L) to language (L’) is a function (f) computable in logarithmic space such that for all inputs (x), (x \(\in\) L) if and only if (f(x) \(\in\) L’). This means that if we can efficiently solve (L’), we can also efficiently solve (L).
The Cook-Levin theorem serves as a cornerstone of complexity theory, establishing a crucial link between the seemingly disparate worlds of Turing machines and Boolean circuits.
The table method exemplifies a powerful technique for transforming computations from one computational model (Turing machines) to another (Boolean circuits), highlighting the interconnectedness of different computational paradigms.
The concepts of reduction and completeness are indispensable tools for classifying the complexity of computational problems, enabling us to organize problems into meaningful hierarchies based on their inherent difficulty.
Next Steps:
In the next lecture, we will harness the power of the Cook-Levin theorem to prove the NP-completeness of other problems by demonstrating reductions from CircuitSat to them. We will also delve deeper into the implications of these results for understanding the profound relationship between P and NP, one of the most important open problems in computer science.