NP-Completeness and Reductions

Author

Your Name

Published

January 28, 2025

Introduction

This lecture delves into the concept of NP-completeness, presenting an alternative definition of NP that utilizes the notion of certificates. We will investigate the relationship between problems in the complexity classes P and NP. A significant portion of the lecture will be dedicated to demonstrating the NP-completeness of 3SAT by establishing a reduction from the SAT problem. The primary objective is to gain a comprehensive understanding of how certificates can be employed to efficiently verify solutions in polynomial time. Furthermore, we will learn how to perform reductions between problems, a crucial technique for proving NP-completeness. Additionally, we will briefly explore the implications of these concepts for other related problems, such as 2SAT, and their intriguing connection to graph reachability problems.

Alternative Definition of NP with Certificates

Definition 1 (Certificate). A certificate for a decision problem is a witness that provides evidence for a "yes" answer. Formally, for a language \(L\), if \(x \in L\), a certificate \(y\) for \(x\) is a string that helps verify the membership of \(x\) in \(L\).

Intuitively, for a problem in NP, it is hard to find a solution. However, given a potential solution (certificate), it is easy to verify its correctness in polynomial time. "Easy" in this context means verifiable by a deterministic Turing machine in polynomial time.

Example 1 (SAT and Certificates). The Satisfiability problem (SAT) is defined as the language containing all satisfiable Boolean formulas. A certificate for a formula \(\phi\) is an evaluation (truth assignment) \(V\) from the variables of \(\phi\) to the set \(\{0, 1\}\) such that \(\phi\) evaluates to 1 (true) under \(V\).

For example, consider the formula \(\phi = (X_1 \lor X_2 \lor X_3) \land (\neg X_1 \lor \neg X_2 \lor \neg X_3)\). A certificate for \(\phi\) is the truth assignment \(V(X_1) = 1, V(X_2) = 0, V(X_3) = 1\). We can verify this in linear time by substituting the values into the formula and evaluating it.

Definition 2 (Polynomially Balanced Relation). A relation \(R\) over pairs of strings is polynomially balanced if there exists a constant \(K\) such that for all \((x, y) \in R\), the length of \(y\) is bounded by a polynomial in the length of \(x\). Formally, \(|y| \leq |x|^K\).

Definition 3 (Polynomially Decidable Relation). A relation \(R\) is polynomially decidable if there exists a deterministic Turing machine \(M\) that decides \(R\) in polynomial time. That is, there exists a constant \(H\) such that for all \((x,y) \in \Sigma^* \times \Sigma^*\), \(M\) decides whether \((x,y) \in R\) in time \(O((|x|+|y|)^H)\).

Theorem 1 (NP and Certificates). This theorem provides an alternative definition of the complexity class NP using the concept of certificates. It states that a language \(L\) is in NP if and only if there exists a polynomially balanced and polynomially decidable relation \(R\) such that \(L\) consists of all strings \(x\) for which there exists a certificate \(y\) such that \((x, y)\) is in \(R\).

Proof. Proof. \((\Rightarrow)\) Suppose \(L \in \text{NP}\). Then, by definition, there exists a non-deterministic Turing machine \(N\) that decides \(L\) in time \(n^H\) for some constant \(H\). We define the relation \(R\) as follows: \[R = \{(x, d_1 d_2 \dots d_m) \mid N \text{ on input } x \text{ with non-deterministic choices } d_1, d_2, \dots, d_m \text{ terminates with "yes"}\}\] Here, \(d_1 d_2 \dots d_m\) represents a sequence of non-deterministic choices made by \(N\) during its computation on input \(x\), and \(m \leq |x|^H\) since \(N\) runs in time \(n^H\).

The certificate for \(x\) is the sequence of non-deterministic choices \(d_1 d_2 \dots d_m\) that leads \(N\) to accept \(x\).

  • \(R\) is polynomially decidable: Given a pair \((x, d_1 d_2 \dots d_m)\), we can simulate \(N\) on input \(x\) using the provided sequence of choices. Since \(N\) runs in polynomial time and the sequence of choices is fixed, this simulation can be done deterministically in polynomial time.

  • \(R\) is polynomially balanced: The length of the certificate \(d_1 d_2 \dots d_m\) is at most \(|x|^H\), which is polynomial in the length of \(x\).

\((\Leftarrow)\) Suppose there exists a relation \(R\) that is polynomially balanced and polynomially decidable, and \(L = \{x \mid \exists y, (x, y) \in R\}\). Since \(R\) is polynomially decidable, there exists a deterministic Turing machine \(M\) that decides \(R\) in time \(O(n^H)\) for some constant \(H\). We construct a non-deterministic Turing machine \(N\) for \(L\) as follows:

  1. Given input \(x\), non-deterministically generate a string \(y\) of length at most \(|x|^K\) (where \(K\) is the constant from the polynomially balanced property of \(R\)).

  2. Run \(M\) on the pair \((x, y)\).

  3. If \(M\) accepts \((x, y)\), then \(N\) accepts \(x\); otherwise, \(N\) rejects \(x\).

This machine \(N\) runs in polynomial time. The non-deterministic generation of \(y\) takes \(O(|x|^K)\) time. Since \(M\) runs in polynomial time and \(R\) is polynomially balanced, the time taken to run \(M\) on \((x, y)\) is \(O((|x| + |y|)^H) = O((|x| + |x|^K)^H)\), which is polynomial in \(|x|\). Therefore, \(N\) is a non-deterministic Turing machine that decides \(L\) in polynomial time, and thus \(L \in \text{NP}\). ◻

NP-Complete Problems

Definition 4 (NP-Complete). A language \(L\) is NP-complete if it satisfies the following two conditions:

  1. \(L \in \text{NP}\)

  2. For every \(L' \in \text{NP}\), \(L'\) is reducible to \(L\) in logarithmic space (denoted as \(L' \leq_{\text{log}} L\)).

3SAT

Definition 5 (3SAT). 3SAT is the set of satisfiable Boolean formulas in conjunctive normal form (CNF) where each clause has exactly three literals.

Theorem 2 (3SAT is NP-Complete). This theorem states that the 3SAT problem, which involves determining the satisfiability of a Boolean formula in conjunctive normal form with exactly three literals per clause, is NP-complete.

Proof. Proof. To prove that 3SAT is NP-complete, we need to show two things:

  1. 3SAT is in NP.

  2. Every language in NP is log-space reducible to 3SAT (which we will show by reducing SAT to 3SAT, since SAT is known to be NP-complete by Cook-Levin theorem).

Part 1: 3SAT is in NP

We can show that 3SAT is in NP using the certificate definition of NP (1). Given a 3CNF formula \(\phi\) and a truth assignment \(V\) as a certificate, we can verify in polynomial time whether \(V\) satisfies \(\phi\). We simply evaluate each clause of \(\phi\) under the assignment \(V\). Since each clause has at most three literals, this evaluation takes constant time per clause. If \(\phi\) has \(m\) clauses, the verification takes \(O(m)\) time, which is linear in the size of the formula. The size of the certificate (the truth assignment) is also linear in the number of variables. Thus, 3SAT is in NP.

Part 2: Reducing SAT to 3SAT

We will now show that SAT \(\leq_{\text{log}}\) 3SAT. Let \(\phi\) be a CNF formula. We will transform \(\phi\) into a 3CNF formula \(R(\phi)\) such that \(\phi\) is satisfiable if and only if \(R(\phi)\) is satisfiable. The transformation will be computable in logarithmic space.

Consider a clause \(C_i\) in \(\phi\) with \(k\) literals: \(C_i = L_{i,1} \lor L_{i,2} \lor \dots \lor L_{i,k}\). We will replace \(C_i\) with a set of clauses with at most three literals each, introducing new variables as needed.

  • If \(k = 1\), \(C_i = L_{i,1}\). We replace \(C_i\) with \((L_{i,1} \lor L_{i,1} \lor L_{i,1})\).

  • If \(k = 2\), \(C_i = L_{i,1} \lor L_{i,2}\). We replace \(C_i\) with \((L_{i,1} \lor L_{i,2} \lor L_{i,2})\).

  • If \(k = 3\), \(C_i = L_{i,1} \lor L_{i,2} \lor L_{i,3}\). We leave \(C_i\) as it is.

  • If \(k > 3\), we introduce \(k-3\) new variables \(Z_{i,1}, Z_{i,2}, \dots, Z_{i,k-3}\) and replace \(C_i\) with the following clauses: $$

    \[\begin{aligned} &(L_{i,1} \lor L_{i,2} \lor Z_{i,1}) \\ &(\neg Z_{i,1} \lor L_{i,3} \lor Z_{i,2}) \\ &(\neg Z_{i,2} \lor L_{i,4} \lor Z_{i,3}) \\ &\dots \\ &(\neg Z_{i,k-4} \lor L_{i,k-2} \lor Z_{i,k-3}) \\ &(\neg Z_{i,k-3} \lor L_{i,k-1} \lor L_{i,k}) \end{aligned}\]

    $$

Input: A CNF formula \(\phi = C_1 \land C_2 \land \dots \land C_m\) Output: A 3CNF formula \(R(\phi)\) Initialize \(R(\phi) = \emptyset\) Add \((L_{i,1} \lor L_{i,1} \lor L_{i,1})\) to \(R(\phi)\) Add \((L_{i,1} \lor L_{i,2} \lor L_{i,2})\) to \(R(\phi)\) Add \(C_i\) to \(R(\phi)\) Introduce new variables \(Z_{i,1}, Z_{i,2}, \dots, Z_{i,k-3}\) Add \((L_{i,1} \lor L_{i,2} \lor Z_{i,1})\) to \(R(\phi)\) Add \((\neg Z_{i,j} \lor L_{i,j+2} \lor Z_{i,j+1})\) to \(R(\phi)\) Add \((\neg Z_{i,k-3} \lor L_{i,k-1} \lor L_{i,k})\) to \(R(\phi)\) \(R(\phi)\)

Correctness of the Reduction:

We need to show that \(\phi\) is satisfiable if and only if \(R(\phi)\) is satisfiable.

  • (\(\Rightarrow\)) If \(\phi\) is satisfiable, then there exists a truth assignment that satisfies all clauses in \(\phi\). We can extend this assignment to the variables in \(R(\phi)\) such that all clauses in \(R(\phi)\) are satisfied. For each clause \(C_i\) in \(\phi\), if \(L_{i,j}\) is the literal that satisfies \(C_i\), we set \(Z_{i,1}, \dots, Z_{i,j-2}\) to true and \(Z_{i,j-1}, \dots, Z_{i,k-3}\) to false. This ensures that all clauses in \(R(\phi)\) corresponding to \(C_i\) are satisfied.

  • (\(\Leftarrow\)) If \(R(\phi)\) is satisfiable, then there exists a truth assignment that satisfies all clauses in \(R(\phi)\). By considering the restriction of this assignment to the variables of \(\phi\), we obtain an assignment that satisfies \(\phi\). This is because if all clauses in \(R(\phi)\) are satisfied, then for each original clause \(C_i\) in \(\phi\), at least one of the literals \(L_{i,1}, \dots, L_{i,k}\) must be true.

Log-Space Computability:

The reduction can be computed in logarithmic space. We only need to keep track of the current clause \(C_i\) and the index \(j\) when creating new variables and clauses. The number of new variables and clauses introduced for each \(C_i\) is proportional to the number of literals in \(C_i\). Thus, the space required is logarithmic in the size of \(\phi\).

Complexity Analysis of the Reduction:

  • Time Complexity: The reduction algorithm iterates through each clause in the input formula \(\phi\). For each clause, it performs a constant amount of work or introduces a number of new clauses and variables that is linear in the size of the clause. Therefore, the time complexity of the reduction is \(O(n)\), where \(n\) is the size of the input formula \(\phi\).

  • Space Complexity: The reduction algorithm uses a constant amount of space to store counters and temporary variables, plus the space needed to write the output formula. The output formula \(R(\phi)\) has a size that is linear in the size of the input formula \(\phi\). However, since the algorithm can generate the output formula on-the-fly, it only needs to store a constant amount of information at any given time. Therefore, the space complexity of the reduction is \(O(\log n)\), where \(n\) is the size of the input formula \(\phi\).

Therefore, SAT \(\leq_{\text{log}}\) 3SAT, and since SAT is NP-complete, 3SAT is also NP-complete. ◻

2SAT

Remark. Remark 1. If we try to reduce 3SAT to 2SAT using a similar trick as before, we encounter a problem. For example, if we have a clause \(L_1 \lor L_2 \lor L_3\), replacing it with \((L_1 \lor Z) \land (\neg Z \lor L_2 \lor L_3)\) still leaves us with a 3-literal clause. Thus, the reduction technique used for 3SAT does not directly apply to 2SAT.

Proposition 3. This proposition states that the 2SAT problem, which involves determining the satisfiability of a Boolean formula in conjunctive normal form with at most two literals per clause, can be transformed into a reachability problem on a directed graph.

Proof. Proof. Consider a 2SAT formula \(\phi\) where each clause has at most two literals. We can represent each clause of the form \(L_1 \lor L_2\) as two implications: \(\neg L_1 \implies L_2\) and \(\neg L_2 \implies L_1\).

We construct a directed graph \(G_\phi = (V, E)\) as follows:

  • The set of vertices \(V\) consists of all literals and their negations present in \(\phi\).

  • For each clause \(L_1 \lor L_2\) in \(\phi\), we add two directed edges to \(E\): \((\neg L_1, L_2)\) and \((\neg L_2, L_1)\).

Example graph for a 2SAT formula containing (L_1 L_2)

The formula \(\phi\) is satisfiable if and only if there is no literal \(L\) such that there is a path from \(L\) to \(\neg L\) and a path from \(\neg L\) to \(L\) in the graph \(G_\phi\). This condition ensures that we cannot have both a literal and its negation being true simultaneously.

Input: A 2SAT formula \(\phi = C_1 \land C_2 \land \dots \land C_m\) Output: A directed graph \(G_\phi = (V, E)\) Initialize \(V = \emptyset\), \(E = \emptyset\) Add \(L\) to \(V\) Add \(\neg L\) to \(V\) Add the edge \((\neg L_{i,1}, L_{i,2})\) to \(E\) Add the edge \((\neg L_{i,2}, L_{i,1})\) to \(E\) \(G_\phi\)

Complexity Analysis of the Graph Construction:

  • Time Complexity: The algorithm iterates through each literal and each clause in the input formula \(\phi\). For each literal, it adds two vertices to the graph, and for each clause, it adds two edges. Therefore, the time complexity is \(O(n)\), where \(n\) is the size of the input formula \(\phi\).

  • Space Complexity: The algorithm stores each literal and its negation as vertices, and for each clause, it stores two edges. Thus, the space complexity is also \(O(n)\), where \(n\) is the size of the input formula \(\phi\).

Thus, we have reduced the 2SAT problem to a reachability problem on a directed graph. ◻

Conclusion

In this lecture, we explored the concept of NP-completeness and presented an alternative definition of NP based on the concept of certificates, as formally defined in 1. We demonstrated that 3SAT is NP-complete by providing a log-space reduction from SAT to 3SAT (2, [alg:sat_to_3sat]). This reduction exemplifies how we can transform one problem into another to prove NP-completeness.

We also discussed the nuances of 2SAT and its connection to graph reachability (3, 1, [alg:2sat_to_graph]). While a direct reduction from 3SAT to 2SAT using the same technique as SAT to 3SAT is not feasible (1), the relationship between 2SAT and graph reachability provides valuable insights into the structure of the problem.

The key takeaway from this lecture is the understanding of how certificates can be used to verify solutions in polynomial time, and how reductions, particularly log-space reductions, can be employed to prove NP-completeness. These concepts are fundamental in the study of computational complexity and provide a framework for analyzing the difficulty of various computational problems.

For the next lecture, I recommend that you delve deeper into formalizing the mapping of 2SAT to a reachability problem on a graph. Consider how the implications in 2SAT formulas can be represented as edges in a graph, and how this representation relates to the propagation of truth values. This will further enhance your understanding of the interplay between logic, graph theory, and computational complexity.