NP-Complete Problems and Related Concepts

Author

Your Name

Published

January 28, 2025

Introduction

This lecture focuses on several NP-complete problems, including Independent Set, Clique, Subgraph Isomorphism, and Integer Programming. We will explore the relationships between these problems and discuss their complexities. Additionally, we will touch upon the concept of Graph Isomorphism and its unknown complexity status, as well as the related notion of Bisimulation. Finally, we will examine the differences between strongly NP-complete and non-strongly NP-complete problems, using Knapsack and Integer Programming as examples. The main objectives are to understand the definitions of these problems, learn how to prove their NP-completeness, and grasp the implications of their complexity classifications.

Specifically, we will delve into the following key areas:

Independent Set: We will define the Independent Set problem and demonstrate its NP-completeness through a reduction from 3-SAT.
Clique: We will define the Clique problem and show its NP-completeness by reducing Independent Set to Clique.
Subgraph Isomorphism: We will define the Subgraph Isomorphism problem and establish its NP-completeness through a reduction from Clique.
Graph Isomorphism: We will discuss the Graph Isomorphism problem, which is known to be in NP but has an unresolved complexity status (neither known to be in P nor NP-complete).
Bisimulation: We will introduce the concept of Bisimulation and Subgraph Bisimulation, highlighting their significance in system verification and their respective complexities (P and NP-complete).
Integer Programming: We will define Integer Programming and prove its NP-completeness by reducing 3-SAT to it.
Knapsack: We will introduce the Knapsack problem, discuss its NP-completeness, and explain the existence of a pseudo-polynomial time algorithm.
Strongly NP-Complete Problems: We will define the concept of strongly NP-complete problems and illustrate it with examples, contrasting Integer Programming (strongly NP-complete) and Knapsack (not strongly NP-complete).

Understanding these problems and their complexities is crucial for both theoretical computer science and practical applications. For instance, the NP-completeness of these problems has implications for the design of efficient algorithms and the development of approximation algorithms. Moreover, concepts like Bisimulation play a vital role in formal verification, where they help determine the equivalence of systems.

The lecture will conclude with a discussion of open questions and future research directions, such as the search for NP-complete problems on specific graph classes like scale-free graphs.

Independent Set

Definition 1 (Independent Set). Given a graph $G = (V, E)$ and an integer $K$, the Independent Set problem asks whether there exists a set $I \subseteq V$ such that $|I| = K$ and for all $u, v \in I$, the edge $(u, v) \notin E$.

Theorem 2 (Independent Set is NP-complete). The Independent Set problem is NP-complete.

Proof. Proof. We need to show two things:

Independent Set is in NP.
Independent Set is NP-hard.

1. Independent Set is in NP:

Given a graph $G=(V,E)$, an integer $K$, and a proposed independent set $I \subseteq V$, we can verify in polynomial time whether $I$ is indeed an independent set of size $K$.

We can check if $|I| = K$ in $O(|I|)$ time.
We can check if for all $u, v \in I$, $(u,v) \notin E$ in $O(|I|^2)$ time by iterating through all pairs of vertices in $I$ and checking the edge set $E$.

Since $|I| \leq |V|$, both steps take at most $O(|V|^2)$ time, which is polynomial in the size of the input. The certificate is the set $I$, and its size is polynomial in the size of the input (at most $|V|$).

2. Independent Set is NP-hard:

To show that Independent Set is NP-hard, we will reduce the 3-SAT problem to Independent Set. Let $\phi$ be a 3-SAT formula with $M$ clauses $C_1, C_2, \dots, C_M$, where each clause $C_i$ is of the form $L_{i1} \lor L_{i2} \lor L_{i3}$, where each $L_{ij}$ is a literal.

We construct a graph $G = (V, E)$ as follows:

For each clause $C_i$, create a triangle (3 vertices connected by edges) in $G$. Label the vertices of this triangle with the literals $L_{i1}, L_{i2}, L_{i3}$.
Connect any two vertices labeled with a literal and its negation with an edge. That is, if there is a vertex labeled $x$ and a vertex labeled $\neg x$, add an edge between them.

Set $K = M$ (the number of clauses).

Input: A 3-SAT formula $\phi$ with clauses $C_1, \dots, C_M$ Output: A graph $G = (V, E)$ and an integer $K$ $V \gets \emptyset$ $E \gets \emptyset$ $K \gets M$ $C_i = L_{i1} \lor L_{i2} \lor L_{i3}$ $V \gets V \cup \{v_{i1}, v_{i2}, v_{i3}\}$ Label $v_{i1}$ with $L_{i1}$, $v_{i2}$ with $L_{i2}$, $v_{i3}$ with $L_{i3}$ $E \gets E \cup \{(v_{i1}, v_{i2}), (v_{i2}, v_{i3}), (v_{i3}, v_{i1})\}$ $E \gets E \cup \{(u, v)\}$ $(G, K)$

Complexity Analysis of the Reduction:

Creating the vertices for each clause takes $O(M)$ time.
Creating the edges within each triangle takes $O(M)$ time.
Creating the edges between opposite literals takes at most $O((3M)^2) = O(M^2)$ time in the worst case.

Therefore, the reduction takes $O(M^2)$ time, which is polynomial in the size of the 3-SAT formula.

Correctness of the Reduction: We need to show that $\phi$ is satisfiable if and only if $G$ has an independent set of size $K=M$.

($\Rightarrow$) Suppose $\phi$ is satisfiable. Then there exists a truth assignment to the variables such that at least one literal in each clause is true. For each clause $C_i$, select one vertex corresponding to a true literal. Since at least one literal is true in each clause, we can select $M$ vertices. These vertices form an independent set because:

No two vertices from the same triangle are selected (since we select only one per clause).
No two vertices labeled with a variable and its negation are selected (since the truth assignment is consistent).

Thus, we have an independent set of size $M$.

($\Leftarrow$) Suppose $G$ has an independent set $I$ of size $K=M$. Since there are $M$ triangles and the vertices in each triangle are fully connected, $I$ must contain exactly one vertex from each triangle. We can construct a truth assignment for $\phi$ as follows:

If a vertex labeled with literal $L$ is in $I$, assign the value True to $L$.

This assignment is consistent because if vertices labeled $x$ and $\neg x$ were both in $I$, there would be an edge between them, contradicting the assumption that $I$ is an independent set. Since we have one vertex from each triangle, each clause has at least one true literal. Thus, $\phi$ is satisfiable.

Conclusion: Since we have shown a polynomial-time reduction from 3-SAT to Independent Set, and 3-SAT is NP-complete, it follows that Independent Set is NP-hard. Combined with the fact that Independent Set is in NP, we conclude that Independent Set is NP-complete. ◻

Example reduction from 3-SAT to Independent Set for (= (X_1 X_2 X_3) (X_1 X_4 X_5) (X_2 X_4 X_6)). The independent set ({X_1, X_4, X_6}) corresponds to the satisfying assignment (X_1 = , X_4 = , X_6 = ).

Clique

Definition 3 (Clique). Given a graph $G = (V, E)$ and an integer $K$, the Clique problem asks whether there exists a set $C \subseteq V$ such that $|C| = K$ and for all $u, v \in C$, the edge $(u, v) \in E$. In other words, a clique is a set of vertices that are all pairwise adjacent.

Theorem 4 (Clique is NP-complete). The Clique problem is NP-complete.

Proof. Proof. We will show that Clique is in NP and that it is NP-hard.

1. Clique is in NP:

Given a graph $G = (V, E)$, an integer $K$, and a proposed clique $C \subseteq V$, we can verify in polynomial time whether $C$ is indeed a clique of size $K$.

We can check if $|C| = K$ in $O(|C|)$ time.
We can check if for all $u, v \in C$, $(u, v) \in E$ in $O(|C|^2)$ time by iterating through all pairs of vertices in $C$ and checking the edge set $E$.

Since $|C| \leq |V|$, both steps take at most $O(|V|^2)$ time, which is polynomial in the size of the input. The certificate is the set $C$, and its size is polynomial in the size of the input (at most $|V|$).

2. Clique is NP-hard:

To show that Clique is NP-hard, we will reduce the Independent Set problem to the Clique problem.

Input: A graph $G = (V, E)$ and an integer $K$ Output: A graph $G' = (V', E')$ and an integer $K'$ $V' \gets V$ $E' \gets \{(u, v) \mid u, v \in V, u \neq v, (u, v) \notin E\}$ $K' \gets K$ $(G', K')$

Complexity Analysis of the Reduction:

The construction of the complement graph $G'$ takes $O(|V|^2)$ time, as we need to iterate through all possible pairs of vertices.

Therefore, the reduction takes $O(|V|^2)$ time, which is polynomial in the size of the input.

Correctness of the Reduction: Given a graph $G = (V, E)$ and an integer $K$, we construct the complement graph $G' = (V, E')$, where $E' = \{(u, v) \mid u, v \in V, u \neq v, (u, v) \notin E\}$. We set $K' = K$.

We claim that $G$ has an independent set of size $K$ if and only if $G'$ has a clique of size $K$.

($\Rightarrow$) Suppose $G$ has an independent set $I$ of size $K$. By definition, for all $u, v \in I$, $(u, v) \notin E$. Thus, by the construction of $G'$, for all $u, v \in I$, $(u, v) \in E'$. Therefore, $I$ is a clique of size $K$ in $G'$.

($\Leftarrow$) Suppose $G'$ has a clique $C$ of size $K$. By definition, for all $u, v \in C$, $(u, v) \in E'$. Thus, by the construction of $G'$, for all $u, v \in C$, $(u, v) \notin E$. Therefore, $C$ is an independent set of size $K$ in $G$.

Conclusion: Since we have shown a polynomial-time reduction from Independent Set to Clique, and Independent Set is NP-complete (Theorem 2), it follows that Clique is NP-hard. Combined with the fact that Clique is in NP, we conclude that Clique is NP-complete. ◻

Example of a graph (G) and its complement (G'). The set ({A, C, E}) is an independent set in (G) and a clique in (G').

Subgraph Isomorphism

Definition 5 (Subgraph Isomorphism). Given two graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$, the Subgraph Isomorphism problem asks whether there exists a subgraph of $G_1$ that is isomorphic to $G_2$. A subgraph of $G_1$ is obtained by taking a subset of nodes $V' \subseteq V_1$ and all edges induced by these nodes, i.e., $E' = \{(u, v) \in E_1 \mid u, v \in V'\}$. Formally, we are looking for a subset $V' \subseteq V_1$ and a bijection $f: V' \rightarrow V_2$ such that for any $u, v \in V'$, $(u, v) \in E_1$ if and only if $(f(u), f(v)) \in E_2$.

Theorem 6 (Subgraph Isomorphism is NP-complete). The Subgraph Isomorphism problem is NP-complete.

Proof. Proof. We will show that Subgraph Isomorphism is in NP and that it is NP-hard.

1. Subgraph Isomorphism is in NP:

Given two graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$, and a proposed subgraph isomorphism (a subset $V' \subseteq V_1$ and a bijection $f: V' \rightarrow V_2$), we can verify in polynomial time whether this is indeed a subgraph isomorphism.

We can check if $|V'| = |V_2|$ in $O(|V'|)$ time.
We can check if $f$ is a bijection in $O(|V'|)$ time.
For each pair $u, v \in V'$, we can check in $O(|V'|^2)$ time whether $(u, v) \in E_1$ if and only if $(f(u), f(v)) \in E_2$.

Since $|V'| \leq |V_1|$, all these steps take at most $O(|V_1|^2)$ time, which is polynomial in the size of the input. The certificate is the pair $(V', f)$, and its size is polynomial in the size of the input (at most $|V_1| + |V_1|\log|V_2|$).

2. Subgraph Isomorphism is NP-hard:

To show that Subgraph Isomorphism is NP-hard, we will reduce the Clique problem to the Subgraph Isomorphism problem.

Input: A graph $G = (V, E)$ and an integer $K$ Output: Graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$ $G_1 \gets G$ $G_2 \gets K_K$ (the complete graph on $K$ vertices) $(G_1, G_2)$

Complexity Analysis of the Reduction:

Setting $G_1 = G$ takes constant time.
Constructing $K_K$ takes $O(K^2)$ time, where $K \leq |V|$.

Therefore, the reduction takes $O(|V|^2)$ time, which is polynomial in the size of the input.

Correctness of the Reduction: Given a graph $G = (V, E)$ and an integer $K$, we construct $G_1 = G$ and $G_2 = K_K$, where $K_K$ is the complete graph on $K$ vertices.

We claim that $G$ has a clique of size $K$ if and only if $G_1$ has a subgraph isomorphic to $G_2$.

($\Rightarrow$) Suppose $G$ has a clique $C$ of size $K$. Then the subgraph of $G$ induced by $C$ is isomorphic to $K_K$. Thus, $G_1$ has a subgraph isomorphic to $G_2$.

($\Leftarrow$) Suppose $G_1$ has a subgraph isomorphic to $G_2 = K_K$. Let $V' \subseteq V_1$ be the vertices of this subgraph, and let $f: V' \rightarrow V_2$ be the isomorphism. Since $G_2$ is a complete graph on $K$ vertices, the subgraph induced by $V'$ in $G_1$ is also a complete graph on $K$ vertices. Therefore, $V'$ forms a clique of size $K$ in $G_1 = G$.

Conclusion: Since we have shown a polynomial-time reduction from Clique to Subgraph Isomorphism, and Clique is NP-complete (Theorem 4), it follows that Subgraph Isomorphism is NP-hard. Combined with the fact that Subgraph Isomorphism is in NP, we conclude that Subgraph Isomorphism is NP-complete. ◻

Example of reducing Clique to Subgraph Isomorphism. (G_1) has a clique of size 3 (vertices (B, D, E)), so it has a subgraph isomorphic to (G_2 = K_3).

Graph Isomorphism

Definition 7 (Graph Isomorphism). Given two graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$, the Graph Isomorphism problem asks whether $G_1$ is isomorphic to $G_2$. This means there exists a bijection $f: V_1 \to V_2$ such that for all $u, v \in V_1$, $(u, v) \in E_1$ if and only if $(f(u), f(v)) \in E_2$. In simpler terms, two graphs are isomorphic if they have the same structure, possibly with different labels on the vertices.

Remark. Remark 8. The Graph Isomorphism problem is in NP. Given a proposed isomorphism (a bijection $f: V_1 \to V_2$), we can verify in polynomial time whether it is indeed an isomorphism. We simply check for each pair of vertices $u, v \in V_1$ whether $(u, v) \in E_1$ if and only if $(f(u), f(v)) \in E_2$. This takes $O(|V_1|^2)$ time, which is polynomial in the size of the input.

Complexity Status: The Graph Isomorphism problem has an intriguing complexity status. It is known to be in NP, but it is not known to be NP-complete or in P. This means:

We do not currently have a polynomial-time algorithm to solve it.
We do not have a proof that it is as hard as any of the known NP-complete problems.
There exists a quasi-polynomial time algorithm for the problem, which suggests that it might lie in a complexity class between P and NP-complete. Specifically, the algorithm runs in time $2^{O((\log n)^c)}$ for some constant $c$.

The unresolved status of Graph Isomorphism’s complexity makes it a fascinating problem in theoretical computer science.

If Graph Isomorphism is in P: It would have significant implications for algorithms involving graph manipulation and could lead to efficient solutions for many practical problems.
If Graph Isomorphism is NP-complete: It would imply that many other problems could be reduced to it, further solidifying its importance and suggesting that it is unlikely to have a polynomial-time algorithm.
Graph Isomorphism as an Intermediate Problem: The existence of a quasi-polynomial time algorithm, along with the lack of a polynomial-time algorithm or an NP-completeness proof, suggests that Graph Isomorphism might be an example of a problem that lies between P and NP-complete, assuming P $\neq$ NP. This makes it a crucial problem for understanding the structure of the complexity landscape.

Example of two isomorphic graphs (G_1) and (G_2). The bijection (f(1)=A, f(2)=B, f(3)=C) is an isomorphism.

Bisimulation

Definition 9 (Bisimulation). Given two graphs (transition systems) $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$, a relation $R \subseteq V_1 \times V_2$ is a bisimulation if for all $(u, v) \in R$:

For all $u' \in V_1$ such that $(u, u') \in E_1$, there exists $v' \in V_2$ such that $(v, v') \in E_2$ and $(u', v') \in R$. (Every transition from $u$ in $G_1$ can be matched by a transition from $v$ in $G_2$)
For all $v' \in V_2$ such that $(v, v') \in E_2$, there exists $u' \in V_1$ such that $(u, u') \in E_1$ and $(u', v') \in R$. (Every transition from $v$ in $G_2$ can be matched by a transition from $u$ in $G_1$)

$G_1$ and $G_2$ are bisimilar if there exists a bisimulation $R$ such that:

For all $u \in V_1$, there exists $v \in V_2$ with $(u, v) \in R$. (Every state in $G_1$ has a corresponding state in $G_2$)
For all $v \in V_2$, there exists $u \in V_1$ with $(u, v) \in R$. (Every state in $G_2$ has a corresponding state in $G_1$)

Definition 10 (Subgraph Bisimulation). Given two graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$, the Subgraph Bisimulation problem asks whether there exists a subgraph of $G_1$ that is bisimilar to $G_2$.

Theorem 11 (Complexity of Bisimulation Problems). Graph Bisimulation is in P, while Subgraph Bisimulation is NP-complete.

Bisimulation is a fundamental concept in the field of formal methods and system verification. It provides a notion of equivalence between systems based on their observable behavior. In many applications, particularly when dealing with the computation of machines or systems, bisimilarity is a more appropriate notion of equivalence than isomorphism. This is because bisimilar systems exhibit the same behavior from the perspective of an observer interacting with the system, even if their internal structures are different. Two systems that are isomorphic are also bisimilar, but the reverse is not always true.

Example: Consider two systems represented by graphs. One system might have a redundant state that is never reached, while the other does not. These systems are not isomorphic, but they might be bisimilar if the redundant state does not affect the observable behavior.

Further Remarks:

Temporal Logic: Bisimilar graphs cannot be distinguished by formulas of the modal $\mu$-calculus, and consequently, by many temporal logics derived from it (such as CTL and LTL). This has significant implications for model checking and system verification, as it means that bisimilar systems satisfy the same temporal logic properties. This is formalized by the van Benthem theorem.
Practical Applications: The concept of bisimulation has been used in various applications, including the analysis of structured data and database systems. However, the NP-completeness of subgraph bisimulation highlights the computational challenges that can arise in these contexts. For instance, some database systems used subgraph bisimulation for comparing structures, initially believing it to be efficiently testable like graph bisimulation. The discovery of its NP-completeness revealed the limitations of this approach.
Difference with Isomorphism: While isomorphism requires a strict one-to-one correspondence between the structures of two graphs, bisimulation focuses on the behavior. Isomorphism can be thought of as an external view of the graphs, while bisimulation represents an internal view, as if one is traversing the graph during a computation.

Example of two bisimilar graphs. The relation (R = {(A, X), (B, Y), (C, Y)}) is a bisimulation. Note that the graphs are not isomorphic.

Integer Programming

Definition 12 (Integer Programming). Given a system of linear inequalities with integer coefficients, the Integer Programming problem asks whether there exists an assignment of integer values to the variables that satisfies all the inequalities. Formally, given a matrix $A \in \mathbb{Z}^{m \times n}$ and a vector $b \in \mathbb{Z}^m$, the problem is to determine if there exists a vector $x \in \mathbb{Z}^n$ such that $Ax \geq b$.

Proposition 13 (Integer Programming is NP-complete). Integer Programming is NP-complete.

Proof. Proof. We will show that Integer Programming is in NP and that it is NP-hard.

1. Integer Programming is in NP:

Given a proposed solution $x \in \mathbb{Z}^n$, we can verify in polynomial time whether it satisfies the system of inequalities $Ax \geq b$.

For each inequality, we can compute the linear combination of the variables with the given coefficients in $O(n)$ time.
We can compare the result with the corresponding value in $b$ in $O(1)$ time.
Since there are $m$ inequalities, the total time is $O(mn)$, which is polynomial in the size of the input.

The certificate is the solution $x$, and its size is polynomial in the size of the input (each element of $x$ can be represented with a number of bits logarithmic in the largest coefficient in $A$ and $b$).

2. Integer Programming is NP-hard:

To show that Integer Programming is NP-hard, we will reduce the 3-SAT problem to Integer Programming.

Input: A 3-SAT formula $\phi$ with clauses $C_1, \dots, C_M$ and variables $X_1, \dots, X_N$ Output: A system of linear inequalities $Ax \geq b$ Initialize $A$ as an empty matrix and $b$ as an empty vector Create a row in $A$ corresponding to $C_i$ Set the entry in the current row of $A$ corresponding to $X_k$ to 1 Set the entry in the current row of $A$ corresponding to $X_k$ to -1 Add 1 to the corresponding entry in $b$ Add 1 to the corresponding entry in $b$ if no negative literals were present Add the constraint $0 \leq X_k \leq 1$ (two rows in $A$ and two entries in $b$) $(A, b)$

Complexity Analysis of the Reduction:

For each clause, we create one inequality, which takes $O(N)$ time.
For each variable, we create two inequalities, which takes $O(1)$ time.
The total time is $O(MN + N) = O(MN)$, which is polynomial in the size of the 3-SAT formula.

Therefore, the reduction takes polynomial time.

Correctness of the Reduction: Let $\phi$ be a 3-SAT formula with clauses $C_1, \dots, C_M$. For each clause $C_i = L_{i1} \lor L_{i2} \lor L_{i3}$, we create the inequality $L'_{i1} + L'_{i2} + L'_{i3} \geq 1$, where $L'_{ij} = X_k$ if $L_{ij} = X_k$ and $L'_{ij} = 1 - X_k$ if $L_{ij} = \neg X_k$. For each variable $X_k$, we add the constraints $0 \leq X_k \leq 1$.

We claim that $\phi$ is satisfiable if and only if the corresponding system of inequalities has an integer solution.

($\Rightarrow$) Suppose $\phi$ is satisfiable. Then there exists a satisfying assignment. For each variable $X_k$, if $X_k$ is true, set $X_k = 1$; if $X_k$ is false, set $X_k = 0$. This assignment satisfies all inequalities derived from the clauses. For example, if $C_i = X_1 \lor \neg X_2 \lor X_3$ and $X_1 = 1, X_2 = 0, X_3 = 1$, the corresponding inequality is $1 + (1-0) + 1 \geq 1$, which is true.

($\Leftarrow$) Suppose the system of inequalities has an integer solution. For each variable $X_k$, if $X_k = 1$, set $X_k$ to true; if $X_k = 0$, set $X_k$ to false. Since each inequality corresponding to a clause is satisfied, and each $X_k$ is either 0 or 1, this assignment satisfies $\phi$. For example, if the inequality is $X_1 + (1 - X_2) + X_3 \geq 1$ and $X_1 = 1, X_2 = 0, X_3 = 0$, then the clause $X_1 \lor \neg X_2 \lor X_3$ is satisfied.

Conclusion: Since we have shown a polynomial-time reduction from 3-SAT to Integer Programming, and 3-SAT is NP-complete, it follows that Integer Programming is NP-hard. Combined with the fact that Integer Programming is in NP, we conclude that Integer Programming is NP-complete. ◻

It is crucial to contrast Integer Programming with Linear Programming, where we seek a solution over the reals instead of integers. While Integer Programming is NP-complete, Linear Programming is in P, meaning it can be solved in polynomial time. This highlights the significant difference in complexity that arises when restricting solutions to integers. The simplex algorithm, although not always polynomial, is often efficient in practice for Linear Programming. However, a different polynomial-time algorithm, the ellipsoid method, exists for Linear Programming.

Further Remarks:

Diophantine Equations (Hilbert’s Tenth Problem): The problem of solving a general system of polynomial inequalities over the integers is known as the Diophantine equation problem or Hilbert’s tenth problem. This problem is undecidable, meaning there is no algorithm that can solve it for all possible inputs. This was proven by Matiyasevich, building on work by Davis, Putnam, and Robinson.
Tarski’s Theorem: Tarski’s theorem provides an algorithm for solving systems of polynomial inequalities over the reals, even with quantifiers. This means that for any first-order formula involving algebraic operations over the real numbers, there exists an algorithm to decide its truth. However, this algorithm is doubly exponential in time complexity, making it impractical for large instances.
Implications: The NP-completeness of Integer Programming, contrasted with the decidability of its real counterpart and the undecidability of general Diophantine equations, underscores the intricate relationship between the domain of a problem (integers vs. reals) and its computational complexity. It also highlights the limitations of computation when dealing with seemingly simple mathematical problems.

Knapsack

Definition 14 (Knapsack). Given a set of $n$ items, each with a weight $w_i$ and a value $v_i$, and a knapsack with maximum capacity $W$, the Knapsack problem asks to find a subset of items such that their total weight is at most $W$ and their total value is maximized. Formally, we want to find a subset $S \subseteq \{1, \dots, n\}$ such that $\sum_{i \in S} w_i \leq W$ and $\sum_{i \in S} v_i$ is maximized.

Theorem 15 (Knapsack is NP-complete). The Knapsack problem is NP-complete.

Proof. Proof. The proof of NP-completeness is not provided in the transcript. However, it is a well-known result and can be proven by reduction from the Subset Sum problem or the Partition problem. The proof involves showing that Knapsack is in NP and that it is NP-hard.

Knapsack is in NP: Given a subset of items, we can verify in polynomial time whether their total weight is at most $W$ and compute their total value.
Knapsack is NP-hard: This can be shown by reducing a known NP-complete problem (e.g., Subset Sum) to Knapsack in polynomial time.

◻

Remark. Remark 16. There exists a pseudo-polynomial time algorithm for Knapsack with time complexity $O(nW)$, where $n$ is the number of items and $W$ is the maximum capacity. This algorithm is pseudo-polynomial because $W$ is encoded in binary in the input, so the input size is logarithmic in $W$. Thus, the algorithm’s runtime is exponential in the size of the input if $W$ is large.

The existence of a pseudo-polynomial time algorithm for Knapsack implies that the problem is not strongly NP-complete. If the weights and the capacity are bounded by a polynomial in the input size (i.e., $W$ is polynomial in $n$), the algorithm runs in polynomial time. This has implications for the approximability of the problem.

Algorithm:

The pseudo-polynomial time algorithm is based on dynamic programming. We define a table $A[i, w]$ to be the maximum value that can be obtained with a total weight of at most $w$ using only the first $i$ items.

Input: Items with weights $w_1, \dots, w_n$ and values $v_1, \dots, v_n$, and capacity $W$ Output: Maximum value that can be obtained Initialize $A[0, w] = 0$ for all $0 \leq w \leq W$ $A[i, w] = \max(A[i-1, w], v_i + A[i-1, w - w_i])$ $A[i, w] = A[i-1, w]$ $A[n, W]$

Complexity Analysis:

Time Complexity: The algorithm fills a table of size $n \times W$, and each entry takes constant time to compute. Therefore, the time complexity is $O(nW)$.
Space Complexity: The algorithm uses a table of size $O(nW)$.

Example:

Consider a knapsack with capacity $W = 5$ and items with weights and values as follows:

| Item | Weight | Value | |——|——–|——-| | 1 | 2 | 6 | | 2 | 2 | 10 | | 3 | 3 | 12 |

The dynamic programming table would be filled as follows:

| i/w | 0 | 1 | 2 | 3 | 4 | 5 | |—–|—|—|—-|—-|—-|—-| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | 0 | 0 | 6 | 6 | 6 | 6 | | 2 | 0 | 0 | 10 | 10 | 16 | 16 | | 3 | 0 | 0 | 10 | 12 | 16 | 18 |

The maximum value that can be obtained is $A[3, 5] = 18$.

Note: While the dynamic programming algorithm provides an optimal solution, it is only efficient when $W$ is not too large. If $W$ is exponentially large compared to $n$, the algorithm becomes impractical. This is why Knapsack is considered a "weakly NP-complete" problem.

Strongly NP-Complete Problems

Definition 17 (Strongly NP-complete). A problem is strongly NP-complete if it remains NP-complete even when all numbers in the input are bounded by a polynomial in the input size. In other words, a problem is strongly NP-complete if it is NP-complete even when the input is represented in unary (where a number $n$ is represented by a string of $n$ ones).

Strong NP-completeness has significant implications for the existence of approximation algorithms. If a problem is strongly NP-complete, it is unlikely to have a fully polynomial-time approximation scheme (FPTAS), unless P = NP. This means that for strongly NP-complete problems, we cannot find solutions arbitrarily close to the optimum in polynomial time, even if we allow the running time to depend polynomially on the desired accuracy.

Examples and Discussion:

Integer Programming: Integer Programming is strongly NP-complete. In the reduction from 3-SAT shown earlier (Algorithm [alg:3sat_to_integer_programming]), the integers used in the inequalities are only 0 and 1 (or sums of 1s). Thus, even if we restrict the input to instances where the numbers are bounded by a constant, the problem remains NP-complete.
Knapsack: Knapsack is not strongly NP-complete. As mentioned earlier (Remark 16), there exists a pseudo-polynomial time algorithm for Knapsack that runs in time $O(nW)$, where $n$ is the number of items and $W$ is the maximum capacity. If $W$ is bounded by a polynomial in the input size (i.e., $W = O(n^k)$ for some constant $k$), then the algorithm runs in polynomial time.
3-SAT: 3-SAT is strongly NP-complete as it only involves Boolean variables, which can be considered as integers 0 and 1. Therefore, restricting the integers to be bounded does not change the complexity of the problem.
Independent Set and Clique: These problems are also strongly NP-complete. The NP-completeness reductions involve constructing graphs where the size of the independent set or clique is related to the number of clauses in the 3-SAT formula. The numbers involved (i.e., the size of the independent set or clique) are not exponentially large compared to the input size.
Traveling Salesperson Problem (TSP): TSP is strongly NP-complete. Even if the distances between cities are restricted to be small integers (e.g., 1 or 2), the problem remains NP-complete.
Partition: The Partition problem, which asks if a set of integers can be partitioned into two subsets with equal sums, is NP-complete but not strongly NP-complete. It has a pseudo-polynomial time algorithm similar to Knapsack.

Problems like Knapsack and Partition, which are NP-complete but have pseudo-polynomial time algorithms, are sometimes called weakly NP-complete. The existence of a pseudo-polynomial time algorithm implies that the problem becomes easier if the numerical part of the input is restricted. This is not the case for strongly NP-complete problems.

Summary:

The distinction between strongly and weakly NP-complete problems is crucial for understanding the complexity landscape and for designing algorithms, especially approximation algorithms. Strongly NP-complete problems are "hard" even when the numbers involved are small, while weakly NP-complete problems may become tractable if the numbers are bounded.

Conclusion

In this lecture, we covered a variety of NP-complete problems, including Independent Set, Clique, Subgraph Isomorphism, and Integer Programming. We demonstrated how to prove their NP-completeness through reductions, establishing a web of interconnected complexities. We also discussed Graph Isomorphism, a problem that intriguingly resides in NP but whose precise complexity status (whether in P or NP-complete) remains an open question.

Furthermore, we introduced the concept of Bisimulation, highlighting its significance in system verification. We learned that while Graph Bisimulation can be solved in polynomial time, Subgraph Bisimulation is NP-complete, illustrating how seemingly minor variations in problem definitions can lead to drastically different complexities.

We delved into the distinction between strongly and weakly NP-complete problems, using Knapsack and Integer Programming as contrasting examples. This distinction underscores the importance of understanding the numerical aspects of a problem and their impact on complexity.

**Key Takeaways:**

Independent Set, Clique, and Subgraph Isomorphism are all NP-complete, and their NP-completeness can be proven through reductions from 3-SAT or other known NP-complete problems.
Graph Isomorphism remains an open problem in terms of its complexity. Its resolution would have significant implications for graph algorithms and complexity theory.
Bisimulation is a crucial concept in system verification, providing a notion of equivalence based on observable behavior. Subgraph Bisimulation is NP-complete, unlike Graph Bisimulation, which is in
Integer Programming is strongly NP-complete, while Knapsack is not (it is weakly NP-complete). This difference stems from the existence of a pseudo-polynomial time algorithm for Knapsack.
Strongly NP-complete problems are unlikely to have a fully polynomial-time approximation scheme (FPTAS), unless P = NP. This has practical implications for designing approximation algorithms.

**Follow-up Questions:**

The lecture concludes with several thought-provoking questions that bridge the gap between theory and practice:

Scale-Free Graphs: Can we find an NP-complete problem that remains NP-complete when restricted to the class of scale-free graphs? This question is motivated by practical applications in quantum architectures and network analysis, where scale-free graphs, characterized by a power-law degree distribution, are of particular interest.
Approximability: What are the precise implications of a problem being strongly NP-complete versus not strongly NP-complete in terms of approximability? We touched upon this briefly, noting that strongly NP-complete problems are less likely to have good approximation algorithms (specifically FPTAS). A deeper understanding of this relationship, including the potential for other types of approximation schemes, is crucial for algorithm design.
Bisimulation and Temporal Logic: How does the concept of bisimulation relate to the verification of systems using temporal logics? We mentioned that bisimilar systems cannot be distinguished by formulas of the modal $\mu$-calculus and many temporal logics derived from it, such as CTL and LTL. This connection between bisimulation and temporal logic is fundamental to formal verification. Exploring this connection further, including the van Benthem theorem, would be beneficial.
Practical Strategies for NP-complete Problems: Given that many real-world problems are NP-complete, what are effective strategies for dealing with them in practice? This could involve exploring approximation algorithms, heuristics, identifying special cases that are tractable, or using techniques like parameterized complexity to gain a more nuanced understanding of the problem’s complexity.

These topics will be further explored in the next lecture, where we will discuss L-complete and NL-complete problems, along with problems in the polynomial hierarchy. We will continue to investigate the fascinating landscape of computational complexity and its implications for solving real-world problems.

--- title: "NP-Complete Problems and Related Concepts" author: "Your Name" date: "2025-01-28" format: html: toc: true # Table of Contents toc-depth: 2 code-tools: true theme: cosmo # Or "journal" for Distill-like minimalism --- # Introduction This lecture focuses on several NP-complete problems, including Independent Set, Clique, Subgraph Isomorphism, and Integer Programming. We will explore the relationships between these problems and discuss their complexities. Additionally, we will touch upon the concept of Graph Isomorphism and its unknown complexity status, as well as the related notion of Bisimulation. Finally, we will examine the differences between strongly NP-complete and non-strongly NP-complete problems, using Knapsack and Integer Programming as examples. The main objectives are to understand the definitions of these problems, learn how to prove their NP-completeness, and grasp the implications of their complexity classifications. Specifically, we will delve into the following key areas: - **Independent Set:** We will define the Independent Set problem and demonstrate its NP-completeness through a reduction from 3-SAT. - **Clique:** We will define the Clique problem and show its NP-completeness by reducing Independent Set to Clique. - **Subgraph Isomorphism:** We will define the Subgraph Isomorphism problem and establish its NP-completeness through a reduction from Clique. - **Graph Isomorphism:** We will discuss the Graph Isomorphism problem, which is known to be in NP but has an unresolved complexity status (neither known to be in P nor NP-complete). - **Bisimulation:** We will introduce the concept of Bisimulation and Subgraph Bisimulation, highlighting their significance in system verification and their respective complexities (P and NP-complete). - **Integer Programming:** We will define Integer Programming and prove its NP-completeness by reducing 3-SAT to it. - **Knapsack:** We will introduce the Knapsack problem, discuss its NP-completeness, and explain the existence of a pseudo-polynomial time algorithm. - **Strongly NP-Complete Problems:** We will define the concept of strongly NP-complete problems and illustrate it with examples, contrasting Integer Programming (strongly NP-complete) and Knapsack (not strongly NP-complete). Understanding these problems and their complexities is crucial for both theoretical computer science and practical applications. For instance, the NP-completeness of these problems has implications for the design of efficient algorithms and the development of approximation algorithms. Moreover, concepts like Bisimulation play a vital role in formal verification, where they help determine the equivalence of systems. The lecture will conclude with a discussion of open questions and future research directions, such as the search for NP-complete problems on specific graph classes like scale-free graphs. # Independent Set :::: tcolorbox ::: {.def:independent_set .definition} **Definition 1** (Independent Set). *Given a graph $G = (V, E)$ and an integer $K$, the Independent Set problem asks whether there exists a set $I \subseteq V$ such that $|I| = K$ and for all $u, v \in I$, the edge $(u, v) \notin E$.* ::: :::: :::: tcolorbox ::: {.thm:independent_set_np_complete .theorem} **Theorem 2** (Independent Set is NP-complete). *The Independent Set problem is NP-complete.* ::: :::: :::::: proof *Proof.* We need to show two things: 1. Independent Set is in NP. 2. Independent Set is NP-hard. **1. Independent Set is in NP:** Given a graph $G=(V,E)$, an integer $K$, and a proposed independent set $I \subseteq V$, we can verify in polynomial time whether $I$ is indeed an independent set of size $K$. - We can check if $|I| = K$ in $O(|I|)$ time. - We can check if for all $u, v \in I$, $(u,v) \notin E$ in $O(|I|^2)$ time by iterating through all pairs of vertices in $I$ and checking the edge set $E$. Since $|I| \leq |V|$, both steps take at most $O(|V|^2)$ time, which is polynomial in the size of the input. The certificate is the set $I$, and its size is polynomial in the size of the input (at most $|V|$). **2. Independent Set is NP-hard:** To show that Independent Set is NP-hard, we will reduce the 3-SAT problem to Independent Set. Let $\phi$ be a 3-SAT formula with $M$ clauses $C_1, C_2, \dots, C_M$, where each clause $C_i$ is of the form $L_{i1} \lor L_{i2} \lor L_{i3}$, where each $L_{ij}$ is a literal. We construct a graph $G = (V, E)$ as follows: - For each clause $C_i$, create a triangle (3 vertices connected by edges) in $G$. Label the vertices of this triangle with the literals $L_{i1}, L_{i2}, L_{i3}$. - Connect any two vertices labeled with a literal and its negation with an edge. That is, if there is a vertex labeled $x$ and a vertex labeled $\neg x$, add an edge between them. Set $K = M$ (the number of clauses). ::::: tcolorbox :::: algorithm ::: algorithmic **Input:** A 3-SAT formula $\phi$ with clauses $C_1, \dots, C_M$ **Output:** A graph $G = (V, E)$ and an integer $K$ $V \gets \emptyset$ $E \gets \emptyset$ $K \gets M$ $C_i = L_{i1} \lor L_{i2} \lor L_{i3}$ $V \gets V \cup \{v_{i1}, v_{i2}, v_{i3}\}$ Label $v_{i1}$ with $L_{i1}$, $v_{i2}$ with $L_{i2}$, $v_{i3}$ with $L_{i3}$ $E \gets E \cup \{(v_{i1}, v_{i2}), (v_{i2}, v_{i3}), (v_{i3}, v_{i1})\}$ $E \gets E \cup \{(u, v)\}$ $(G, K)$ ::: :::: ::::: **Complexity Analysis of the Reduction:** - Creating the vertices for each clause takes $O(M)$ time. - Creating the edges within each triangle takes $O(M)$ time. - Creating the edges between opposite literals takes at most $O((3M)^2) = O(M^2)$ time in the worst case. Therefore, the reduction takes $O(M^2)$ time, which is polynomial in the size of the 3-SAT formula. **Correctness of the Reduction:** We need to show that $\phi$ is satisfiable if and only if $G$ has an independent set of size $K=M$. **($\Rightarrow$)** Suppose $\phi$ is satisfiable. Then there exists a truth assignment to the variables such that at least one literal in each clause is true. For each clause $C_i$, select one vertex corresponding to a true literal. Since at least one literal is true in each clause, we can select $M$ vertices. These vertices form an independent set because: - No two vertices from the same triangle are selected (since we select only one per clause). - No two vertices labeled with a variable and its negation are selected (since the truth assignment is consistent). Thus, we have an independent set of size $M$. **($\Leftarrow$)** Suppose $G$ has an independent set $I$ of size $K=M$. Since there are $M$ triangles and the vertices in each triangle are fully connected, $I$ must contain exactly one vertex from each triangle. We can construct a truth assignment for $\phi$ as follows: - If a vertex labeled with literal $L$ is in $I$, assign the value True to $L$. This assignment is consistent because if vertices labeled $x$ and $\neg x$ were both in $I$, there would be an edge between them, contradicting the assumption that $I$ is an independent set. Since we have one vertex from each triangle, each clause has at least one true literal. Thus, $\phi$ is satisfiable. **Conclusion:** Since we have shown a polynomial-time reduction from 3-SAT to Independent Set, and 3-SAT is NP-complete, it follows that Independent Set is NP-hard. Combined with the fact that Independent Set is in NP, we conclude that Independent Set is NP-complete. ◻ :::::: <figure id="fig:independent_set_reduction"> <figcaption>Example reduction from 3-SAT to Independent Set for $\phi = (X_1 \lor X_2 \lor X_3) \land (\neg X_1 \lor X_4 \lor X_5) \land (\neg X_2 \lor \neg X_4 \lor X_6)$. The independent set $\{X_1, X_4, X_6\}$ corresponds to the satisfying assignment $X_1 = \text{True}, X_4 = \text{True}, X_6 = \text{True}$.</figcaption> </figure> # Clique :::: tcolorbox ::: {.def:clique .definition} **Definition 3** (Clique). *Given a graph $G = (V, E)$ and an integer $K$, the Clique problem asks whether there exists a set $C \subseteq V$ such that $|C| = K$ and for all $u, v \in C$, the edge $(u, v) \in E$. In other words, a clique is a set of vertices that are all pairwise adjacent.* ::: :::: :::: tcolorbox ::: {.thm:clique_np_complete .theorem} **Theorem 4** (Clique is NP-complete). *The Clique problem is NP-complete.* ::: :::: :::::: proof *Proof.* We will show that Clique is in NP and that it is NP-hard. **1. Clique is in NP:** Given a graph $G = (V, E)$, an integer $K$, and a proposed clique $C \subseteq V$, we can verify in polynomial time whether $C$ is indeed a clique of size $K$. - We can check if $|C| = K$ in $O(|C|)$ time. - We can check if for all $u, v \in C$, $(u, v) \in E$ in $O(|C|^2)$ time by iterating through all pairs of vertices in $C$ and checking the edge set $E$. Since $|C| \leq |V|$, both steps take at most $O(|V|^2)$ time, which is polynomial in the size of the input. The certificate is the set $C$, and its size is polynomial in the size of the input (at most $|V|$). **2. Clique is NP-hard:** To show that Clique is NP-hard, we will reduce the Independent Set problem to the Clique problem. ::::: tcolorbox :::: algorithm ::: algorithmic **Input:** A graph $G = (V, E)$ and an integer $K$ **Output:** A graph $G' = (V', E')$ and an integer $K'$ $V' \gets V$ $E' \gets \{(u, v) \mid u, v \in V, u \neq v, (u, v) \notin E\}$ $K' \gets K$ $(G', K')$ ::: :::: ::::: **Complexity Analysis of the Reduction:** - The construction of the complement graph $G'$ takes $O(|V|^2)$ time, as we need to iterate through all possible pairs of vertices. Therefore, the reduction takes $O(|V|^2)$ time, which is polynomial in the size of the input. **Correctness of the Reduction:** Given a graph $G = (V, E)$ and an integer $K$, we construct the complement graph $G' = (V, E')$, where $E' = \{(u, v) \mid u, v \in V, u \neq v, (u, v) \notin E\}$. We set $K' = K$. We claim that $G$ has an independent set of size $K$ if and only if $G'$ has a clique of size $K$. **($\Rightarrow$)** Suppose $G$ has an independent set $I$ of size $K$. By definition, for all $u, v \in I$, $(u, v) \notin E$. Thus, by the construction of $G'$, for all $u, v \in I$, $(u, v) \in E'$. Therefore, $I$ is a clique of size $K$ in $G'$. **($\Leftarrow$)** Suppose $G'$ has a clique $C$ of size $K$. By definition, for all $u, v \in C$, $(u, v) \in E'$. Thus, by the construction of $G'$, for all $u, v \in C$, $(u, v) \notin E$. Therefore, $C$ is an independent set of size $K$ in $G$. **Conclusion:** Since we have shown a polynomial-time reduction from Independent Set to Clique, and Independent Set is NP-complete (Theorem [2](#thm:independent_set_np_complete){reference-type="ref" reference="thm:independent_set_np_complete"}), it follows that Clique is NP-hard. Combined with the fact that Clique is in NP, we conclude that Clique is NP-complete. ◻ :::::: <figure id="fig:clique_complement"> <figcaption>Example of a graph $G$ and its complement $G'$. The set $\{A, C, E\}$ is an independent set in $G$ and a clique in $G'$.</figcaption> </figure> # Subgraph Isomorphism :::: tcolorbox ::: {.def:subgraph_isomorphism .definition} **Definition 5** (Subgraph Isomorphism). *Given two graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$, the Subgraph Isomorphism problem asks whether there exists a subgraph of $G_1$ that is isomorphic to $G_2$. A subgraph of $G_1$ is obtained by taking a subset of nodes $V' \subseteq V_1$ and all edges induced by these nodes, i.e., $E' = \{(u, v) \in E_1 \mid u, v \in V'\}$. Formally, we are looking for a subset $V' \subseteq V_1$ and a bijection $f: V' \rightarrow V_2$ such that for any $u, v \in V'$, $(u, v) \in E_1$ if and only if $(f(u), f(v)) \in E_2$.* ::: :::: :::: tcolorbox ::: {.thm:subgraph_isomorphism_np_complete .theorem} **Theorem 6** (Subgraph Isomorphism is NP-complete). *The Subgraph Isomorphism problem is NP-complete.* ::: :::: :::::: proof *Proof.* We will show that Subgraph Isomorphism is in NP and that it is NP-hard. **1. Subgraph Isomorphism is in NP:** Given two graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$, and a proposed subgraph isomorphism (a subset $V' \subseteq V_1$ and a bijection $f: V' \rightarrow V_2$), we can verify in polynomial time whether this is indeed a subgraph isomorphism. - We can check if $|V'| = |V_2|$ in $O(|V'|)$ time. - We can check if $f$ is a bijection in $O(|V'|)$ time. - For each pair $u, v \in V'$, we can check in $O(|V'|^2)$ time whether $(u, v) \in E_1$ if and only if $(f(u), f(v)) \in E_2$. Since $|V'| \leq |V_1|$, all these steps take at most $O(|V_1|^2)$ time, which is polynomial in the size of the input. The certificate is the pair $(V', f)$, and its size is polynomial in the size of the input (at most $|V_1| + |V_1|\log|V_2|$). **2. Subgraph Isomorphism is NP-hard:** To show that Subgraph Isomorphism is NP-hard, we will reduce the Clique problem to the Subgraph Isomorphism problem. ::::: tcolorbox :::: algorithm ::: algorithmic **Input:** A graph $G = (V, E)$ and an integer $K$ **Output:** Graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$ $G_1 \gets G$ $G_2 \gets K_K$ (the complete graph on $K$ vertices) $(G_1, G_2)$ ::: :::: ::::: **Complexity Analysis of the Reduction:** - Setting $G_1 = G$ takes constant time. - Constructing $K_K$ takes $O(K^2)$ time, where $K \leq |V|$. Therefore, the reduction takes $O(|V|^2)$ time, which is polynomial in the size of the input. **Correctness of the Reduction:** Given a graph $G = (V, E)$ and an integer $K$, we construct $G_1 = G$ and $G_2 = K_K$, where $K_K$ is the complete graph on $K$ vertices. We claim that $G$ has a clique of size $K$ if and only if $G_1$ has a subgraph isomorphic to $G_2$. **($\Rightarrow$)** Suppose $G$ has a clique $C$ of size $K$. Then the subgraph of $G$ induced by $C$ is isomorphic to $K_K$. Thus, $G_1$ has a subgraph isomorphic to $G_2$. **($\Leftarrow$)** Suppose $G_1$ has a subgraph isomorphic to $G_2 = K_K$. Let $V' \subseteq V_1$ be the vertices of this subgraph, and let $f: V' \rightarrow V_2$ be the isomorphism. Since $G_2$ is a complete graph on $K$ vertices, the subgraph induced by $V'$ in $G_1$ is also a complete graph on $K$ vertices. Therefore, $V'$ forms a clique of size $K$ in $G_1 = G$. **Conclusion:** Since we have shown a polynomial-time reduction from Clique to Subgraph Isomorphism, and Clique is NP-complete (Theorem [4](#thm:clique_np_complete){reference-type="ref" reference="thm:clique_np_complete"}), it follows that Subgraph Isomorphism is NP-hard. Combined with the fact that Subgraph Isomorphism is in NP, we conclude that Subgraph Isomorphism is NP-complete. ◻ :::::: <figure id="fig:subgraph_isomorphism_example"> <figcaption>Example of reducing Clique to Subgraph Isomorphism. $G_1$ has a clique of size 3 (vertices $B, D, E$), so it has a subgraph isomorphic to $G_2 = K_3$.</figcaption> </figure> # Graph Isomorphism :::: tcolorbox ::: {.def:graph_isomorphism .definition} **Definition 7** (Graph Isomorphism). *Given two graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$, the Graph Isomorphism problem asks whether $G_1$ is isomorphic to $G_2$. This means there exists a bijection $f: V_1 \to V_2$ such that for all $u, v \in V_1$, $(u, v) \in E_1$ if and only if $(f(u), f(v)) \in E_2$. In simpler terms, two graphs are isomorphic if they have the same structure, possibly with different labels on the vertices.* ::: :::: ::: remark **Remark 8**. *The Graph Isomorphism problem is in NP. Given a proposed isomorphism (a bijection $f: V_1 \to V_2$), we can verify in polynomial time whether it is indeed an isomorphism. We simply check for each pair of vertices $u, v \in V_1$ whether $(u, v) \in E_1$ if and only if $(f(u), f(v)) \in E_2$. This takes $O(|V_1|^2)$ time, which is polynomial in the size of the input.* ::: ::: mdframed **Complexity Status:** The Graph Isomorphism problem has an intriguing complexity status. It is known to be in NP, but it is **not known** to be NP-complete or in P. This means: - We do not currently have a polynomial-time algorithm to solve it. - We do not have a proof that it is as hard as any of the known NP-complete problems. - There exists a quasi-polynomial time algorithm for the problem, which suggests that it might lie in a complexity class between P and NP-complete. Specifically, the algorithm runs in time $2^{O((\log n)^c)}$ for some constant $c$. ::: ::: tcolorbox The unresolved status of Graph Isomorphism's complexity makes it a fascinating problem in theoretical computer science. - **If Graph Isomorphism is in P:** It would have significant implications for algorithms involving graph manipulation and could lead to efficient solutions for many practical problems. - **If Graph Isomorphism is NP-complete:** It would imply that many other problems could be reduced to it, further solidifying its importance and suggesting that it is unlikely to have a polynomial-time algorithm. - **Graph Isomorphism as an Intermediate Problem:** The existence of a quasi-polynomial time algorithm, along with the lack of a polynomial-time algorithm or an NP-completeness proof, suggests that Graph Isomorphism might be an example of a problem that lies between P and NP-complete, assuming P $\neq$ NP. This makes it a crucial problem for understanding the structure of the complexity landscape. ::: <figure id="fig:graph_isomorphism_example"> <figcaption>Example of two isomorphic graphs $G_1$ and $G_2$. The bijection $f(1)=A, f(2)=B, f(3)=C$ is an isomorphism.</figcaption> </figure> # Bisimulation :::: tcolorbox ::: {.def:bisimulation .definition} **Definition 9** (Bisimulation). *Given two graphs (transition systems) $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$, a relation $R \subseteq V_1 \times V_2$ is a bisimulation if for all $(u, v) \in R$:* - *For all $u' \in V_1$ such that $(u, u') \in E_1$, there exists $v' \in V_2$ such that $(v, v') \in E_2$ and $(u', v') \in R$. (Every transition from $u$ in $G_1$ can be matched by a transition from $v$ in $G_2$)* - *For all $v' \in V_2$ such that $(v, v') \in E_2$, there exists $u' \in V_1$ such that $(u, u') \in E_1$ and $(u', v') \in R$. (Every transition from $v$ in $G_2$ can be matched by a transition from $u$ in $G_1$)* *$G_1$ and $G_2$ are bisimilar if there exists a bisimulation $R$ such that:* - *For all $u \in V_1$, there exists $v \in V_2$ with $(u, v) \in R$. (Every state in $G_1$ has a corresponding state in $G_2$)* - *For all $v \in V_2$, there exists $u \in V_1$ with $(u, v) \in R$. (Every state in $G_2$ has a corresponding state in $G_1$)* ::: :::: :::: tcolorbox ::: {.def:subgraph_bisimulation .definition} **Definition 10** (Subgraph Bisimulation). *Given two graphs $G_1 = (V_1, E_1)$ and $G_2 = (V_2, E_2)$, the Subgraph Bisimulation problem asks whether there exists a subgraph of $G_1$ that is bisimilar to $G_2$.* ::: :::: :::: tcolorbox ::: {.thm:bisimulation_complexity .theorem} **Theorem 11** (Complexity of Bisimulation Problems). *Graph Bisimulation is in P, while Subgraph Bisimulation is NP-complete.* ::: :::: ::: tcolorbox Bisimulation is a fundamental concept in the field of formal methods and system verification. It provides a notion of equivalence between systems based on their observable behavior. In many applications, particularly when dealing with the computation of machines or systems, bisimilarity is a more appropriate notion of equivalence than isomorphism. This is because bisimilar systems exhibit the same behavior from the perspective of an observer interacting with the system, even if their internal structures are different. Two systems that are isomorphic are also bisimilar, but the reverse is not always true. ::: ::: mdframed **Example:** Consider two systems represented by graphs. One system might have a redundant state that is never reached, while the other does not. These systems are not isomorphic, but they might be bisimilar if the redundant state does not affect the observable behavior. ::: **Further Remarks:** - **Temporal Logic:** Bisimilar graphs cannot be distinguished by formulas of the modal $\mu$-calculus, and consequently, by many temporal logics derived from it (such as CTL and LTL). This has significant implications for model checking and system verification, as it means that bisimilar systems satisfy the same temporal logic properties. This is formalized by the van Benthem theorem. - **Practical Applications:** The concept of bisimulation has been used in various applications, including the analysis of structured data and database systems. However, the NP-completeness of subgraph bisimulation highlights the computational challenges that can arise in these contexts. For instance, some database systems used subgraph bisimulation for comparing structures, initially believing it to be efficiently testable like graph bisimulation. The discovery of its NP-completeness revealed the limitations of this approach. - **Difference with Isomorphism:** While isomorphism requires a strict one-to-one correspondence between the structures of two graphs, bisimulation focuses on the behavior. Isomorphism can be thought of as an external view of the graphs, while bisimulation represents an internal view, as if one is traversing the graph during a computation. <figure id="fig:bisimulation_example"> <figcaption>Example of two bisimilar graphs. The relation $R = \{(A, X), (B, Y), (C, Y)\}$ is a bisimulation. Note that the graphs are not isomorphic.</figcaption> </figure> # Integer Programming :::: tcolorbox ::: {.def:integer_programming .definition} **Definition 12** (Integer Programming). *Given a system of linear inequalities with integer coefficients, the Integer Programming problem asks whether there exists an assignment of integer values to the variables that satisfies all the inequalities. Formally, given a matrix $A \in \mathbb{Z}^{m \times n}$ and a vector $b \in \mathbb{Z}^m$, the problem is to determine if there exists a vector $x \in \mathbb{Z}^n$ such that $Ax \geq b$.* ::: :::: :::: tcolorbox ::: {.prop:integer_programming_np_complete .proposition} **Proposition 13** (Integer Programming is NP-complete). *Integer Programming is NP-complete.* ::: :::: :::::: proof *Proof.* We will show that Integer Programming is in NP and that it is NP-hard. **1. Integer Programming is in NP:** Given a proposed solution $x \in \mathbb{Z}^n$, we can verify in polynomial time whether it satisfies the system of inequalities $Ax \geq b$. - For each inequality, we can compute the linear combination of the variables with the given coefficients in $O(n)$ time. - We can compare the result with the corresponding value in $b$ in $O(1)$ time. - Since there are $m$ inequalities, the total time is $O(mn)$, which is polynomial in the size of the input. The certificate is the solution $x$, and its size is polynomial in the size of the input (each element of $x$ can be represented with a number of bits logarithmic in the largest coefficient in $A$ and $b$). **2. Integer Programming is NP-hard:** To show that Integer Programming is NP-hard, we will reduce the 3-SAT problem to Integer Programming. ::::: tcolorbox :::: algorithm ::: algorithmic **Input:** A 3-SAT formula $\phi$ with clauses $C_1, \dots, C_M$ and variables $X_1, \dots, X_N$ **Output:** A system of linear inequalities $Ax \geq b$ Initialize $A$ as an empty matrix and $b$ as an empty vector Create a row in $A$ corresponding to $C_i$ Set the entry in the current row of $A$ corresponding to $X_k$ to 1 Set the entry in the current row of $A$ corresponding to $X_k$ to -1 Add 1 to the corresponding entry in $b$ Add 1 to the corresponding entry in $b$ if no negative literals were present Add the constraint $0 \leq X_k \leq 1$ (two rows in $A$ and two entries in $b$) $(A, b)$ ::: :::: ::::: **Complexity Analysis of the Reduction:** - For each clause, we create one inequality, which takes $O(N)$ time. - For each variable, we create two inequalities, which takes $O(1)$ time. - The total time is $O(MN + N) = O(MN)$, which is polynomial in the size of the 3-SAT formula. Therefore, the reduction takes polynomial time. **Correctness of the Reduction:** Let $\phi$ be a 3-SAT formula with clauses $C_1, \dots, C_M$. For each clause $C_i = L_{i1} \lor L_{i2} \lor L_{i3}$, we create the inequality $L'_{i1} + L'_{i2} + L'_{i3} \geq 1$, where $L'_{ij} = X_k$ if $L_{ij} = X_k$ and $L'_{ij} = 1 - X_k$ if $L_{ij} = \neg X_k$. For each variable $X_k$, we add the constraints $0 \leq X_k \leq 1$. We claim that $\phi$ is satisfiable if and only if the corresponding system of inequalities has an integer solution. **($\Rightarrow$)** Suppose $\phi$ is satisfiable. Then there exists a satisfying assignment. For each variable $X_k$, if $X_k$ is true, set $X_k = 1$; if $X_k$ is false, set $X_k = 0$. This assignment satisfies all inequalities derived from the clauses. For example, if $C_i = X_1 \lor \neg X_2 \lor X_3$ and $X_1 = 1, X_2 = 0, X_3 = 1$, the corresponding inequality is $1 + (1-0) + 1 \geq 1$, which is true. **($\Leftarrow$)** Suppose the system of inequalities has an integer solution. For each variable $X_k$, if $X_k = 1$, set $X_k$ to true; if $X_k = 0$, set $X_k$ to false. Since each inequality corresponding to a clause is satisfied, and each $X_k$ is either 0 or 1, this assignment satisfies $\phi$. For example, if the inequality is $X_1 + (1 - X_2) + X_3 \geq 1$ and $X_1 = 1, X_2 = 0, X_3 = 0$, then the clause $X_1 \lor \neg X_2 \lor X_3$ is satisfied. **Conclusion:** Since we have shown a polynomial-time reduction from 3-SAT to Integer Programming, and 3-SAT is NP-complete, it follows that Integer Programming is NP-hard. Combined with the fact that Integer Programming is in NP, we conclude that Integer Programming is NP-complete. ◻ :::::: ::: tcolorbox It is crucial to contrast Integer Programming with **Linear Programming**, where we seek a solution over the *reals* instead of integers. While Integer Programming is NP-complete, Linear Programming is in P, meaning it can be solved in polynomial time. This highlights the significant difference in complexity that arises when restricting solutions to integers. The simplex algorithm, although not always polynomial, is often efficient in practice for Linear Programming. However, a different polynomial-time algorithm, the ellipsoid method, exists for Linear Programming. ::: **Further Remarks:** - **Diophantine Equations (Hilbert's Tenth Problem):** The problem of solving a general system of polynomial inequalities over the integers is known as the Diophantine equation problem or Hilbert's tenth problem. This problem is **undecidable**, meaning there is no algorithm that can solve it for all possible inputs. This was proven by Matiyasevich, building on work by Davis, Putnam, and Robinson. - **Tarski's Theorem:** Tarski's theorem provides an algorithm for solving systems of polynomial inequalities over the reals, even with quantifiers. This means that for any first-order formula involving algebraic operations over the real numbers, there exists an algorithm to decide its truth. However, this algorithm is **doubly exponential** in time complexity, making it impractical for large instances. - **Implications:** The NP-completeness of Integer Programming, contrasted with the decidability of its real counterpart and the undecidability of general Diophantine equations, underscores the intricate relationship between the domain of a problem (integers vs. reals) and its computational complexity. It also highlights the limitations of computation when dealing with seemingly simple mathematical problems. # Knapsack :::: tcolorbox ::: {.def:knapsack .definition} **Definition 14** (Knapsack). *Given a set of $n$ items, each with a weight $w_i$ and a value $v_i$, and a knapsack with maximum capacity $W$, the Knapsack problem asks to find a subset of items such that their total weight is at most $W$ and their total value is maximized. Formally, we want to find a subset $S \subseteq \{1, \dots, n\}$ such that $\sum_{i \in S} w_i \leq W$ and $\sum_{i \in S} v_i$ is maximized.* ::: :::: :::: tcolorbox ::: {.thm:knapsack_np_complete .theorem} **Theorem 15** (Knapsack is NP-complete). *The Knapsack problem is NP-complete.* ::: :::: ::: proof *Proof.* The proof of NP-completeness is not provided in the transcript. However, it is a well-known result and can be proven by reduction from the Subset Sum problem or the Partition problem. The proof involves showing that Knapsack is in NP and that it is NP-hard. - **Knapsack is in NP:** Given a subset of items, we can verify in polynomial time whether their total weight is at most $W$ and compute their total value. - **Knapsack is NP-hard:** This can be shown by reducing a known NP-complete problem (e.g., Subset Sum) to Knapsack in polynomial time. ◻ ::: ::: {.rem:knapsack_pseudo_poly .remark} **Remark 16**. *There exists a pseudo-polynomial time algorithm for Knapsack with time complexity $O(nW)$, where $n$ is the number of items and $W$ is the maximum capacity. This algorithm is pseudo-polynomial because $W$ is encoded in binary in the input, so the input size is logarithmic in $W$. Thus, the algorithm's runtime is exponential in the size of the input if $W$ is large.* ::: ::: tcolorbox The existence of a pseudo-polynomial time algorithm for Knapsack implies that the problem is **not strongly NP-complete**. If the weights and the capacity are bounded by a polynomial in the input size (i.e., $W$ is polynomial in $n$), the algorithm runs in polynomial time. This has implications for the approximability of the problem. ::: **Algorithm:** The pseudo-polynomial time algorithm is based on dynamic programming. We define a table $A[i, w]$ to be the maximum value that can be obtained with a total weight of at most $w$ using only the first $i$ items. :::: algorithm ::: algorithmic **Input:** Items with weights $w_1, \dots, w_n$ and values $v_1, \dots, v_n$, and capacity $W$ **Output:** Maximum value that can be obtained Initialize $A[0, w] = 0$ for all $0 \leq w \leq W$ $A[i, w] = \max(A[i-1, w], v_i + A[i-1, w - w_i])$ $A[i, w] = A[i-1, w]$ $A[n, W]$ ::: :::: **Complexity Analysis:** - **Time Complexity:** The algorithm fills a table of size $n \times W$, and each entry takes constant time to compute. Therefore, the time complexity is $O(nW)$. - **Space Complexity:** The algorithm uses a table of size $O(nW)$. **Example:** Consider a knapsack with capacity $W = 5$ and items with weights and values as follows: \| Item \| Weight \| Value \| \|------\|--------\|-------\| \| 1 \| 2 \| 6 \| \| 2 \| 2 \| 10 \| \| 3 \| 3 \| 12 \| The dynamic programming table would be filled as follows: \| i/w \| 0 \| 1 \| 2 \| 3 \| 4 \| 5 \| \|-----\|---\|---\|----\|----\|----\|----\| \| 0 \| 0 \| 0 \| 0 \| 0 \| 0 \| 0 \| \| 1 \| 0 \| 0 \| 6 \| 6 \| 6 \| 6 \| \| 2 \| 0 \| 0 \| 10 \| 10 \| 16 \| 16 \| \| 3 \| 0 \| 0 \| 10 \| 12 \| 16 \| 18 \| The maximum value that can be obtained is $A[3, 5] = 18$. **Note:** While the dynamic programming algorithm provides an optimal solution, it is only efficient when $W$ is not too large. If $W$ is exponentially large compared to $n$, the algorithm becomes impractical. This is why Knapsack is considered a \"weakly NP-complete\" problem. # Strongly NP-Complete Problems :::: tcolorbox ::: {.def:strongly_np_complete .definition} **Definition 17** (Strongly NP-complete). *A problem is **strongly NP-complete** if it remains NP-complete even when all numbers in the input are bounded by a polynomial in the input size. In other words, a problem is strongly NP-complete if it is NP-complete even when the input is represented in unary (where a number $n$ is represented by a string of $n$ ones).* ::: :::: ::: tcolorbox Strong NP-completeness has significant implications for the existence of approximation algorithms. If a problem is strongly NP-complete, it is unlikely to have a fully polynomial-time approximation scheme (FPTAS), unless P = NP. This means that for strongly NP-complete problems, we cannot find solutions arbitrarily close to the optimum in polynomial time, even if we allow the running time to depend polynomially on the desired accuracy. ::: **Examples and Discussion:** - **Integer Programming:** Integer Programming is **strongly NP-complete**. In the reduction from 3-SAT shown earlier (Algorithm [\[alg:3sat_to_integer_programming\]](#alg:3sat_to_integer_programming){reference-type="ref" reference="alg:3sat_to_integer_programming"}), the integers used in the inequalities are only 0 and 1 (or sums of 1s). Thus, even if we restrict the input to instances where the numbers are bounded by a constant, the problem remains NP-complete. - **Knapsack:** Knapsack is **not strongly NP-complete**. As mentioned earlier (Remark [16](#rem:knapsack_pseudo_poly){reference-type="ref" reference="rem:knapsack_pseudo_poly"}), there exists a pseudo-polynomial time algorithm for Knapsack that runs in time $O(nW)$, where $n$ is the number of items and $W$ is the maximum capacity. If $W$ is bounded by a polynomial in the input size (i.e., $W = O(n^k)$ for some constant $k$), then the algorithm runs in polynomial time. - **3-SAT:** 3-SAT is **strongly NP-complete** as it only involves Boolean variables, which can be considered as integers 0 and 1. Therefore, restricting the integers to be bounded does not change the complexity of the problem. - **Independent Set and Clique:** These problems are also **strongly NP-complete**. The NP-completeness reductions involve constructing graphs where the size of the independent set or clique is related to the number of clauses in the 3-SAT formula. The numbers involved (i.e., the size of the independent set or clique) are not exponentially large compared to the input size. - **Traveling Salesperson Problem (TSP):** TSP is **strongly NP-complete**. Even if the distances between cities are restricted to be small integers (e.g., 1 or 2), the problem remains NP-complete. - **Partition:** The Partition problem, which asks if a set of integers can be partitioned into two subsets with equal sums, is NP-complete but **not strongly NP-complete**. It has a pseudo-polynomial time algorithm similar to Knapsack. ::: tcolorbox Problems like Knapsack and Partition, which are NP-complete but have pseudo-polynomial time algorithms, are sometimes called **weakly NP-complete**. The existence of a pseudo-polynomial time algorithm implies that the problem becomes easier if the numerical part of the input is restricted. This is not the case for strongly NP-complete problems. ::: **Summary:** The distinction between strongly and weakly NP-complete problems is crucial for understanding the complexity landscape and for designing algorithms, especially approximation algorithms. Strongly NP-complete problems are \"hard\" even when the numbers involved are small, while weakly NP-complete problems may become tractable if the numbers are bounded. # Conclusion In this lecture, we covered a variety of NP-complete problems, including Independent Set, Clique, Subgraph Isomorphism, and Integer Programming. We demonstrated how to prove their NP-completeness through reductions, establishing a web of interconnected complexities. We also discussed Graph Isomorphism, a problem that intriguingly resides in NP but whose precise complexity status (whether in P or NP-complete) remains an open question. Furthermore, we introduced the concept of Bisimulation, highlighting its significance in system verification. We learned that while Graph Bisimulation can be solved in polynomial time, Subgraph Bisimulation is NP-complete, illustrating how seemingly minor variations in problem definitions can lead to drastically different complexities. We delved into the distinction between strongly and weakly NP-complete problems, using Knapsack and Integer Programming as contrasting examples. This distinction underscores the importance of understanding the numerical aspects of a problem and their impact on complexity. \*\*Key Takeaways:\*\* - Independent Set, Clique, and Subgraph Isomorphism are all NP-complete, and their NP-completeness can be proven through reductions from 3-SAT or other known NP-complete problems. - Graph Isomorphism remains an open problem in terms of its complexity. Its resolution would have significant implications for graph algorithms and complexity theory. - Bisimulation is a crucial concept in system verification, providing a notion of equivalence based on observable behavior. Subgraph Bisimulation is NP-complete, unlike Graph Bisimulation, which is in P. - Integer Programming is strongly NP-complete, while Knapsack is not (it is weakly NP-complete). This difference stems from the existence of a pseudo-polynomial time algorithm for Knapsack. - Strongly NP-complete problems are unlikely to have a fully polynomial-time approximation scheme (FPTAS), unless P = NP. This has practical implications for designing approximation algorithms. \*\*Follow-up Questions:\*\* The lecture concludes with several thought-provoking questions that bridge the gap between theory and practice: 1. **Scale-Free Graphs:** Can we find an NP-complete problem that remains NP-complete when restricted to the class of scale-free graphs? This question is motivated by practical applications in quantum architectures and network analysis, where scale-free graphs, characterized by a power-law degree distribution, are of particular interest. 2. **Approximability:** What are the precise implications of a problem being strongly NP-complete versus not strongly NP-complete in terms of approximability? We touched upon this briefly, noting that strongly NP-complete problems are less likely to have good approximation algorithms (specifically FPTAS). A deeper understanding of this relationship, including the potential for other types of approximation schemes, is crucial for algorithm design. 3. **Bisimulation and Temporal Logic:** How does the concept of bisimulation relate to the verification of systems using temporal logics? We mentioned that bisimilar systems cannot be distinguished by formulas of the modal $\mu$-calculus and many temporal logics derived from it, such as CTL and LTL. This connection between bisimulation and temporal logic is fundamental to formal verification. Exploring this connection further, including the van Benthem theorem, would be beneficial. 4. **Practical Strategies for NP-complete Problems:** Given that many real-world problems are NP-complete, what are effective strategies for dealing with them in practice? This could involve exploring approximation algorithms, heuristics, identifying special cases that are tractable, or using techniques like parameterized complexity to gain a more nuanced understanding of the problem's complexity. These topics will be further explored in the next lecture, where we will discuss L-complete and NL-complete problems, along with problems in the polynomial hierarchy. We will continue to investigate the fascinating landscape of computational complexity and its implications for solving real-world problems.