Complete Problems and Reductions in Complexity Theory
Introduction
This lecture delves into the concept of complete problems within complexity classes and the crucial notion of reductions between problems. Our primary goal is to identify the "most difficult" problems within a given complexity class, such as \(\mathbf{L}\) (Logarithmic Space), \(\mathbf{NL}\) (Nondeterministic Logarithmic Space), \(\mathbf{P}\) (Polynomial Time), \(\mathbf{NP}\) (Nondeterministic Polynomial Time), and \(\mathbf{coNP}\) (Complement of \(\mathbf{NP}\)).
Understanding complete problems allows us to analyze the relationships between complexity classes more effectively. For instance, if we aim to determine whether two complexity classes are equivalent (e.g., \(\mathbf{P}\) vs. \(\mathbf{NP}\)), we can focus on their respective complete problems rather than analyzing the entire, potentially infinite, set of problems within each class.
To achieve this, we will introduce the concept of logarithmic space reductions. These reductions are fundamental for studying complete problems within the complexity classes we are interested in. They provide a way to compare the difficulty of problems by transforming instances of one problem into instances of another while preserving the "yes" or "no" answer. This transformation must be computationally efficient, specifically requiring only logarithmic space.
Reductions: A reduction from problem \(A\) to problem \(B\) shows that if we can solve \(B\), we can also solve \(A\). The reduction itself must be computationally "easy" compared to the complexity of solving \(B\).
Completeness: A problem is complete for a class if it is among the hardest problems in that class. Every other problem in the class can be reduced to it.
In the following sections, we will formally define logarithmic space reductions, explore their properties, and use them to define the notion of completeness. We will then examine how these concepts help us understand the structure and relationships between various complexity classes. We will also see that once we have identified one complete problem for a class, we can use it as a stepping stone to discover other complete problems within the same class, leveraging the transitivity of reductions. Finally we will see an example of reduction between the 3-Coloring problem and SAT.
Reductions
Definition 1 (Logarithmic Space Reduction). Let \(L_1\) be a language over the alphabet \(\Sigma_1\) and \(L_2\) be a language over the alphabet \(\Sigma_2\). A function \(R: \Sigma_1^* \to \Sigma_2^*\) is a logarithmic space reduction* (or simply reduction) from \(L_1\) to \(L_2\) if the following conditions hold:*
\(x \in L_1 \iff R(x) \in L_2\) for all \(x \in \Sigma_1^*\).
\(R\) is computable in logarithmic space.
A reduction \(R\) from \(L_1\) to \(L_2\) can be formally described as a function computed by a Turing machine with the following properties:
The machine takes an input string \(x \in \Sigma_1^*\) on its input tape.
It produces an output string \(R(x) \in \Sigma_2^*\) on its output tape.
The machine uses only logarithmic space on its work tape(s) with respect to the length of the input, i.e., \(O(\log |x|)\).
The machine must be an output machine, meaning it cannot modify the input and cannot reread the output.
The computation satisfies the correctness condition: \(x \in L_1 \iff R(x) \in L_2\).
In this case, we say that \(L_1\) is log-space reducible to \(L_2\), denoted as \(L_1 \leq_\mathbf{L}L_2\).
Intuitively, a reduction from a problem \(P_1\) to a problem \(P_2\) means that an instance \(x\) of \(P_1\) can be transformed into an instance \(R(x)\) of \(P_2\) such that the answer to \(x\) in \(P_1\) is "yes" if and only if the answer to \(R(x)\) in \(P_2\) is "yes". The transformation \(R\) must be efficiently computable, specifically in logarithmic space. This ensures that the reduction process does not add significant computational overhead.
Remark. Remark 1. The notion of reduction used here is a logarithmic space reduction, which is more restrictive than the general definitions found in other contexts. We focus on logarithmic space reductions because \(\mathbf{L}\) is the smallest complexity class we consider in this course. This choice allows us to study complete problems within classes like \(\mathbf{L}\) and \(\mathbf{NL}\).
Remark. Remark 2. Many textbooks, such as "Computers and Intractability: A Guide to the Theory of NP-Completeness" by Garey and Johnson, define reductions using polynomial time computability. While polynomial time reductions are useful for studying classes like \(\mathbf{P}\) and \(\mathbf{NP}\), they are not suitable for reasoning about classes like \(\mathbf{L}\) and \(\mathbf{NL}\). Since \(\mathbf{L}\subseteq \mathbf{P}\), any logarithmic space reduction is also a polynomial time reduction. However, the converse is not necessarily true.
Properties of Reductions
Proposition 1. The reducibility relation \(\leq_\mathbf{L}\) is reflexive and transitive.
Proof. Proof.
Reflexivity: \(L_1 \leq_\mathbf{L}L_1\) via the identity function \(R(x) = x\). The identity function is computable in logarithmic space, as it simply copies the input to the output without requiring additional memory.
Transitivity: If \(L_1 \leq_\mathbf{L}L_2\) and \(L_2 \leq_\mathbf{L}L_3\), then there exist logarithmic space reductions \(R_1\) from \(L_1\) to \(L_2\) and \(R_2\) from \(L_2\) to \(L_3\). The composition \(R_2 \circ R_1\) is a reduction from \(L_1\) to \(L_3\).
To show that \(R_2 \circ R_1\) is computable in logarithmic space, consider the following diagram:
Let \(x\) be an input string in \(\Sigma_1^*\). We cannot simply compute \(R_1(x)\) and store the result, as it might require polynomial space (since the output of a log-space reduction can be polynomially larger than the input). Instead, we simulate the computation of \(R_2\) on the output of \(R_1(x)\) on-the-fly.
Input: \(x \in \Sigma_1^*\) Output: \(R_2(R_1(x))\) \(i \gets 1\) Simulate \(R_1\) on input \(x\) until it produces the \(i\)-th bit of its output. Provide this bit to \(R_2\). \(i \gets i + 1\) Continue the simulation of \(R_2\) until it produces its final output.
Whenever \(R_2\) needs a bit of \(R_1(x)\), we compute that bit using \(R_1\) without storing the entire intermediate result (Algorithm [alg:composition]). This is possible because \(R_1\) is a logarithmic space computable function, and we can recompute any bit of its output using logarithmic space. Thus, the composition \(R_2 \circ R_1\) is also computable in logarithmic space.
◻
Remark. Remark 3. If \(L_1 \leq_\mathbf{L}L_2\) and \(L_2 \leq_\mathbf{L}L_1\), it does not necessarily imply that \(L_1 = L_2\). This relationship indicates that \(L_1\) and \(L_2\) have the same complexity in terms of logarithmic space reducibility, but they are not necessarily the same language.
Complete Problems
Definition 2 (\(C\)-Completeness). Let \(C\) be a complexity class and \(L\) be a language. Then \(L\) is \(C\)-complete* if:*
\(L \in C\), and
for all \(L' \in C\), \(L' \leq_\mathbf{L}L\).
A problem is complete for a class if it satisfies two conditions:
Membership: It belongs to the class itself.
Hardness: Every other problem in the class can be reduced to it via a logarithmic space reduction.
This means that complete problems are the "most difficult" problems in the class, in the sense that if we can solve a complete problem efficiently, we can solve any other problem in the class efficiently (by reducing it to the complete problem).
Remark. Remark 4. It is crucial to prove that a problem is in the class \(C\) before proving its completeness. This is often overlooked but is a necessary condition for \(C\)-completeness.
Remark. Remark 5. Finding the first complete problem for a class usually requires a significant amount of work, often involving a universal quantifier over all problems in the class. However, once we have one complete problem, we can use it to find others more easily.
Lemma 2. Let \(C\) be a complexity class. If \(L\) is \(C\)-complete, \(L' \in C\), and \(L \leq_\mathbf{L}L'\), then \(L'\) is also \(C\)-complete.
Proof. Proof. Since \(L\) is \(C\)-complete, for all \(L'' \in C\), we have \(L'' \leq_\mathbf{L}L\). By transitivity of \(\leq_\mathbf{L}\) (Proposition 1), and the fact that \(L \leq_\mathbf{L}L'\), we have \(L'' \leq_\mathbf{L}L'\) for all \(L'' \in C\). Since \(L' \in C\) by assumption, \(L'\) is \(C\)-complete. ◻
This lemma provides a powerful tool for proving the completeness of other problems. Once we have established one complete problem for a class, we can prove the completeness of another problem by simply showing that:
The new problem is in the class.
The first complete problem can be reduced to the new problem.
Remark. Remark 6. The set of \(C\)-complete problems forms a subset of \(C\). This subset represents the hardest problems within the class.
Classes Closed Under Reduction
Definition 3 (Closed Under Reduction). A complexity class \(C\) is closed under logarithmic space reduction* (or simply closed under reduction) if for any language \(L \in C\) and any language \(L'\) such that \(L' \leq_\mathbf{L}L\), it holds that \(L' \in C\).*
If a complexity class is closed under reduction, it means that if a problem \(L\) is in the class, and another problem \(L'\) can be reduced to \(L\) using a logarithmic space reduction, then \(L'\) is also guaranteed to be in the same complexity class. This property is crucial for understanding the relationships between different complexity classes and for identifying complete problems.
Proposition 3. The classes \(\mathbf{P}\), \(\mathbf{L}\), \(\mathbf{NL}\), \(\mathbf{PSPACE}\), and \(\mathbf{EXP}\) are closed under logarithmic space reductions.
Proof. Proof. We will demonstrate the proof for \(\mathbf{P}\) and \(\mathbf{L}\); the proofs for \(\mathbf{NL}\), \(\mathbf{PSPACE}\), and \(\mathbf{EXP}\) follow a similar argument.
\(\mathbf{P}\):
Suppose \(L \in \mathbf{P}\) and \(L' \leq_\mathbf{L}L\).
Then there exists a polynomial-time algorithm \(A\) for \(L\) and a logarithmic-space reduction \(R\) from \(L'\) to \(L\).
We can construct an algorithm \(A'\) for \(L'\) as follows: on input \(x\), compute \(R(x)\) and then run algorithm \(A\) on \(R(x)\).
Since \(R\) is a logarithmic-space reduction, it runs in polynomial time (as logarithmic space implies polynomial time).
Therefore, the composition of \(R\) and \(A\) also runs in polynomial time.
Thus, \(A'\) is a polynomial-time algorithm for \(L'\), so \(L' \in \mathbf{P}\).
\(\mathbf{L}\):
Suppose \(L \in \mathbf{L}\) and \(L' \leq_\mathbf{L}L\).
Then there exists a logarithmic-space algorithm \(A\) for \(L\) and a logarithmic-space reduction \(R\) from \(L'\) to \(L\).
As shown in the transitivity property of reductions (Proposition 1), the composition of two logarithmic-space computations results in a logarithmic-space computation.
Therefore, we can construct a logarithmic-space algorithm for \(L'\) by composing \(R\) and \(A\), so \(L' \in \mathbf{L}\).
◻
Remark. Remark 7. For a given \(k\), the class \(\mathbf{TIME}(n^k)\) is not closed under reduction.
Try to prove that for any fixed \(k\), the class \(\mathbf{TIME}(n^k)\) is not closed under logarithmic space reductions. You may need to use the Time Hierarchy Theorem.
For example, if \(L \in \mathbf{TIME}(n^2)\) and \(L' \leq_\mathbf{L}L\), it is possible that \(L' \in \mathbf{TIME}(n^3)\) but \(L' \notin \mathbf{TIME}(n^2)\). This is because the composition of a logarithmic space reduction with a \(n^2\) time algorithm can result in an algorithm that takes \(n^3\) time or more. This observation highlights why we often focus on classes like \(\mathbf{P}\) rather than specific polynomial time classes like \(\mathbf{TIME}(n^k)\).
Remark. Remark 8. Since \(\mathbf{P}\) is closed under reductions, it is easier to reason about it. The sub-classes \(\mathbf{TIME}(n^k)\) inside \(\mathbf{P}\) are not closed under reductions.
Using Completeness to Relate Complexity Classes
Theorem 4. Let \(C\) and \(C'\) be complexity classes closed under reductions, with \(C' \subseteq C\). If \(L\) is a \(C\)-complete problem and \(L \in C'\), then \(C = C'\).
Proof. Proof. Since \(C' \subseteq C\), we only need to show that \(C \subseteq C'\). Let \(L'\) be an arbitrary language in \(C\). Since \(L\) is \(C\)-complete, by definition, \(L' \leq_\mathbf{L}L\). We are given that \(L \in C'\), and since \(C'\) is closed under reduction, it follows that \(L' \in C'\). Thus, we have shown that any language \(L'\) in \(C\) is also in \(C'\), which means \(C \subseteq C'\). Therefore, since we have both \(C' \subseteq C\) and \(C \subseteq C'\), we conclude that \(C = C'\). ◻
This theorem provides a powerful tool for relating complexity classes. If we can find a \(C\)-complete problem that belongs to a smaller class \(C'\), then we can conclude that the two classes are equal. This is particularly useful for resolving open questions in complexity theory. The theorem allows us to reduce a question about the equality of two classes to a question about the membership of a single complete problem in a smaller class.
Example 1. Consider the classes \(\mathbf{NP}\) and \(\mathbf{P}\). We know that \(\mathbf{P}\subseteq \mathbf{NP}\), and both classes are closed under logarithmic space reductions.
If we can show that an \(\mathbf{NP}\)-complete problem, such as SAT (Satisfiability), is in \(\mathbf{P}\), then by Theorem 4, we can conclude that \(\mathbf{P}= \mathbf{NP}\).
Conversely, if we can prove that an \(\mathbf{NP}\)-complete problem like SAT is not* in \(\mathbf{P}\), then we can conclude that \(\mathbf{P}\neq \mathbf{NP}\).*
This demonstrates how focusing on a single complete problem can help us resolve the famous \(\mathbf{P}\) vs. \(\mathbf{NP}\) problem. The open question of whether \(\mathbf{P}= \mathbf{NP}\) can be answered by either finding a polynomial-time algorithm for an \(\mathbf{NP}\)-complete problem or proving that no such algorithm exists.
Remark. Remark 9. So far, we do not have a polynomial time algorithm for SAT, nor do we have a formal proof of the fact that there does not exist a polynomial time algorithm for SAT.
Example Reduction: 3-Coloring to SAT
In this section, we will demonstrate a concrete example of a logarithmic space reduction by showing how to reduce the 3-coloring problem to the satisfiability problem (SAT). This example will illustrate how a problem about graphs can be transformed into a problem about Boolean formulas.
SAT (Satisfiability)
Definition 4 (SAT). Given a Boolean formula \(\phi\) over variables \(X_1, \dots, X_m\), the Satisfiability problem (SAT) asks to decide whether there exists an assignment of truth values (true or false) to the variables that makes the formula \(\phi\) true.
A Boolean formula is an expression involving Boolean variables and logical connectives such as AND (\(\wedge\)), OR (\(\vee\)), and NOT (\(\neg\)).
A formula is in Conjunctive Normal Form (CNF) if it is a conjunction (AND) of clauses, where each clause is a disjunction (OR) of literals.
A literal is a variable or its negation.
A formula is in Disjunctive Normal Form (DNF) if it is a disjunction (OR) of clauses, where each clause is a conjunction (AND) of literals.
A clause is a disjunction of literals.
Example 2. \(\phi = (\neg X_1 \vee X_2) \wedge (X_1 \vee X_3) \wedge (\neg X_2)\) is a Boolean formula in CNF, where \((\neg X_1 \vee X_2)\), \((X_1 \vee X_3)\), and \((\neg X_2)\) are clauses, and \(\neg X_1\), \(X_2\), \(X_1\), \(X_3\), and \(\neg X_2\) are literals.
Definition 5 (3-SAT). 3-SAT is the restriction of the SAT problem to Boolean formulas in CNF where each clause has at most three literals.
Remark. Remark 10. SAT and 3-SAT are both \(\mathbf{NP}\)-complete problems.
Remark. Remark 11. If a formula is in DNF, it is immediate to check whether it is satisfiable or not.
Graph Coloring
Definition 6 (Graph Coloring). Given a graph \(G = (V, E)\) and an integer \(k\), the \(k\)-coloring problem asks to decide whether it is possible to assign each node in \(V\) a color from the set \(\{1, 2, \dots, k\}\) such that if \((u, v) \in E\) (i.e., there is an edge between nodes \(u\) and \(v\)), then the color of \(u\) is different from the color of \(v\).
Remark. Remark 12. The 2-colorability problem is equivalent to determining if a graph is bipartite. A graph is bipartite if and only if it can be colored with two colors.
Remark. Remark 13. 3-coloring is an \(\mathbf{NP}\)-complete problem.
Reduction from 3-Coloring to SAT
We will now show a reduction from 3-coloring to SAT. Given a graph \(G = (V, E)\), we will construct a Boolean formula \(\phi_G\) such that \(G\) is 3-colorable if and only if \(\phi_G\) is satisfiable.
For each node \(u \in V\), we introduce three Boolean variables: \(u_R\), \(u_Y\), and \(u_B\), representing the colors red, yellow, and blue, respectively. The formula \(\phi_G\) is constructed as follows:
Each node must have a color: For each \(u \in V\), we add the clause \((u_R \vee u_Y \vee u_B)\). This ensures that each node is assigned at least one color.
Each node has exactly one color: For each \(u \in V\), we add the clauses \((\neg u_R \vee \neg u_Y)\), \((\neg u_Y \vee \neg u_B)\), and \((\neg u_B \vee \neg u_R)\). These clauses ensure that each node is assigned at most one color. Note: we can rewrite \((u_R \implies (\neg u_Y \wedge \neg u_B))\) as \((\neg u_R \vee (\neg u_Y \wedge \neg u_B))\), which is equivalent to \((\neg u_R \vee \neg u_Y) \wedge (\neg u_R \vee \neg u_B)\).
Adjacent nodes have different colors: For each edge \((u, v) \in E\), we add the clauses \((\neg u_R \vee \neg v_R)\), \((\neg u_Y \vee \neg v_Y)\), and \((\neg u_B \vee \neg v_B)\). These clauses ensure that adjacent nodes have different colors.
The formula \(\phi_G\) is the conjunction of all these clauses.
Proposition 5. A graph \(G\) is 3-colorable if and only if the formula \(\phi_G\) is satisfiable.
Proof. Proof. (\(\Rightarrow\)): Suppose \(G\) is 3-colorable. Then there exists a valid coloring of the nodes with red, yellow, and blue. For each node \(u\), we assign the corresponding Boolean variable (\(u_R\), \(u_Y\), or \(u_B\)) to true if the node is colored with that color, and the others to false. This assignment will satisfy all the clauses in \(\phi_G\).
(\(\Leftarrow\)): Suppose \(\phi_G\) is satisfiable. Then there exists a truth assignment to the Boolean variables that makes \(\phi_G\) true. For each node \(u\), exactly one of the variables (\(u_R\), \(u_Y\), or \(u_B\)) must be true, which corresponds to the color assigned to that node. The clauses ensure that adjacent nodes have different colors. Therefore, this truth assignment corresponds to a valid 3-coloring of \(G\). ◻
Remark. Remark 14. The reduction from 3-coloring to SAT can be computed in logarithmic space.
To generate the clauses, we only need to iterate through the nodes and edges of the graph.
For each node, we generate clauses based on its color variables.
For each edge, we generate clauses based on the color variables of the connected nodes.
We can store a few node indices at a time to generate the clauses, and the number of variables and clauses is polynomial in the size of the graph. Thus, the space required for the computation is logarithmic with respect to the input size (the size of the graph).
Remark. Remark 15. In this reduction, the number of three colorings and the number of evaluations satisfying the formula are exactly the same. This is not always the case for reductions.
Conclusion
In this lecture, we have introduced the fundamental concepts of logarithmic space reductions, complete problems, and classes closed under reductions. We have demonstrated how these concepts can be used to understand the relationships between different complexity classes and to identify the most difficult problems within a class. Additionally, we have provided a concrete example of a reduction from the 3-coloring problem to the satisfiability problem (SAT), illustrating how problems from different domains can be related through reductions.
Reductions: A logarithmic space reduction from \(L_1\) to \(L_2\) (\(L_1 \leq_\mathbf{L}L_2\)) shows that \(L_1\) is "no harder than" \(L_2\) in terms of computational complexity. If we can solve \(L_2\) efficiently, we can also solve \(L_1\) efficiently.
Complete Problems: A \(C\)-complete problem is among the "hardest" problems in the complexity class \(C\). Every problem in \(C\) can be reduced to a \(C\)-complete problem.
Classes Closed Under Reduction: If a complexity class \(C\) is closed under reduction, and a problem \(L\) is in \(C\), then any problem \(L'\) that reduces to \(L\) is also in \(C\).
Relating Complexity Classes: If a \(C\)-complete problem is found to be in a subclass \(C'\) of \(C\) (where \(C'\) is also closed under reduction), then \(C = C'\). This provides a powerful method for proving the equality of complexity classes.
The concepts introduced in this lecture are foundational for understanding the structure of complexity classes and the relationships between them. In the coming lectures, we will build upon these ideas to explore specific complete problems for various complexity classes and delve deeper into the implications of these findings. We will see as a first result that SAT is NP-complete, and from this we will obtain that graph coloring is NP-complete, that Hamiltonian path is NP-complete, and many other problems are NP-complete, without exploiting the definition of completeness, but relying on the property of Lemma 2.
For the next lecture, consider the following questions, which will be explored further:
Can you think of other examples of reductions between problems in different complexity classes? For example, consider reductions between graph problems (e.g., Hamiltonian cycle, vertex cover), logic problems (e.g., different variants of SAT), or numerical problems (e.g., subset sum, knapsack).
How might we prove that a problem is complete for a given class? What techniques or strategies can be used to establish completeness? This often involves showing that a known complete problem can be reduced to the problem in question.
What are the implications of finding a polynomial-time algorithm for an \(\mathbf{NP}\)-complete problem? How would this discovery impact our understanding of the \(\mathbf{P}\) vs. \(\mathbf{NP}\) problem and the field of computer science as a whole?
These topics will be explored further in the upcoming lectures as we continue our journey into the fascinating world of computational complexity.