Complexity Classes and Relationships: Reachability Method and Savitch’s Theorem

Author

Your Name

Published

January 28, 2025

Introduction

In our previous discussion, we analyzed several relationships between complexity classes, and we successfully established the first five relationships. We now focus on proving the inclusion: \[NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n}).\] To achieve this, we will employ the reachability method, a technique that leverages graph representations of Turing machine computations. This method is quite general and has applications in various verification techniques, particularly in the field of dynamic verification of systems. The core idea is to represent all possible executions of a system as a finite graph and then analyze the properties of this graph. This approach is fundamental in areas such as model checking and abstract interpretation.

Reachability Problem: Determining if a path exists between two nodes in a graph.
Configuration of a Turing Machine: A snapshot of the Turing machine’s state, tape contents, and head positions at a given point in time.
Graph Representation of Computations: Representing all possible computations of a Turing machine as a graph, where nodes are configurations and edges are transitions between configurations.

We will explore how these concepts help us relate non-deterministic space complexity to deterministic time and space complexity, and how they connect to the hierarchy of complexity classes such as L, NL, P, NP, PSPACE, NPSPACE, EXP, and NEXP. We will also touch upon the state space explosion problem in model checking and the distinction between static and dynamic verification techniques. Furthermore, we will discuss Savitch’s theorem, which provides a deterministic space upper bound for the reachability problem, and its implications for the relationship between deterministic and non-deterministic space complexity. Finally, we will examine the open problem of whether L equals NL and the role of reachability in resolving this question.

Reachability Method

Our goal is to demonstrate that if a language $L$ belongs to the complexity class $NSPACE(f(n))$, where $f$ is a proper function, then $L$ also belongs to $\bigcup_{c>0} TIME(c^{f(n) + \log n})$. This implies the existence of a non-deterministic Turing machine $N$ that decides $L$ using space $f(n)$. When analyzing space complexity, we specifically consider input-output Turing machines.

An input-output Turing machine is a Turing machine with a designated read-only input tape, a designated write-only output tape, and one or more working tapes. The machine cannot modify the content of the input tape and it cannot reuse the content of the output tape.

Given an input string $X$, our objective is to determine whether $X$ is a member of the language $L$. The Turing machine $N$ begins its computation in an initial configuration determined by the input $X$. This initial configuration encompasses the contents of the input tape, the working tapes, and the output tape. Formally, the initial configuration can be represented as: \[(q_0, X\sqcup, \underbrace{\epsilon, \dots, \epsilon}_{k-2 \text{ times}}, \epsilon)\] where $q_0$ is the initial state, $X\sqcup$ is the input string followed by a blank symbol, $\epsilon$ represents an empty working tape, and the last $\epsilon$ represents the empty output tape.

Crucially, if $X$ is indeed in $L$, there exists at least one computation path of the non-deterministic machine $N$ that eventually reaches an accepting state, denoted as "yes". This accepting configuration can be represented as: \[(q_{\text{yes}}, U_1, V_1, U_2, V_2, \dots, U_{k-1}, V_{k-1}, W)\] where $q_{\text{yes}}$ is the accepting state, $U_1$ and $V_1$ represent the split of the input tape content, $U_i$ and $V_i$ are the contents of the working tapes, and $W$ is the content of the output tape. Conversely, if $X$ is not in $L$, no computation path of $N$ will ever reach a configuration with the "yes" state.

To analyze this, we consider a graph $G_{N,X}$ that is specific to both the Turing machine $N$ and the input $X$. This graph, $G_{N,X}$, represents all possible configurations of $N$ when processing the input $X$, and the transitions between these configurations.

The configuration graph $G_{N,X}$ of a Turing machine $N$ on input $X$ is a directed graph where:

Nodes are the possible configurations of $N$ on input $X$.
There is a directed edge from configuration $C_1$ to $C_2$ if $N$ can transition from $C_1$ to $C_2$ in one step.

The reachability problem on this graph is to determine whether a path exists from the initial configuration to any accepting configuration.

Lemma 1. A string $X$ belongs to the language $L$ if and only if a configuration with state "yes" is reachable from the initial configuration in the graph $G_{N,X}$.

Proof. Proof. If $X \in L$, then there exists an accepting computation of $N$ on $X$. This computation corresponds to a path in $G_{N,X}$ from the initial configuration to an accepting configuration. Conversely, if an accepting configuration is reachable in $G_{N,X}$, then there exists an accepting computation of $N$ on $X$, implying $X \in L$. ◻

The graph $G_{N,X}$ is finite because the Turing machine $N$ works in space $f(n)$, and it is acyclic because our machines are always terminating. The maximum out-degree of each node is bounded by a constant that depends on the machine $N$ but not on the input $X$.

A simplified representation of the configuration graph (G_{N,X}). Nodes represent configurations, and directed edges represent transitions between configurations. Solid arrows represent transitions in one computation path, and dashed arrows represent transitions in another possible computation path. The double circle represents an accepting configuration.

Figure 1 shows a simplified representation of the configuration graph $G_{N,X}$.

Simplification of Configurations

To simplify the analysis, we can reduce the complexity of the configurations:

Ignoring the Output Tape: The content of the output tape is irrelevant for deciding whether $X \in L$, so we can ignore it.
Input Tape Index: Since the input tape content is fixed as $X\sqcup$, we can replace the content of the input tape ($U_1$ and $V_1$) with a single index $J$ that represents the split of $X\sqcup$ into $U_1$ and $V_1$, where $U_1V_1 = X\sqcup$. The index $J$ ranges from $0$ to $|X| + 1$.

Thus, a simplified configuration can be represented as $(q, J, U_2, V_2, \dots, U_{k-1}, V_{k-1})$.

Number of Possible Configurations

To determine the size of the graph $G_{N,X}$, we need to estimate the number of possible configurations.

The state $q$ can take $|Q| + 2$ values, where $|Q|$ is the number of states of the machine $N$, plus two additional states "yes" and "no".
The index $J$ can take $|X| + 2$ values.
Each working tape string $U_i$ or $V_i$ has a length of at most $f(|X|)$, since the machine $N$ works in space $f(n)$. The number of possible strings of length at most $f(|X|)$ is $|\Sigma|^{f(|X|)}$, where $\Sigma$ is the tape alphabet.
There are $k-2$ working tapes.

Therefore, the total number of possible configurations is bounded by: \[(|Q| + 2) \cdot (|X| + 2) \cdot (|\Sigma|^{f(|X|)})^{2(k-2)}\] where $\Sigma$ is the tape alphabet. This can be simplified to: \[(|Q| + 2) \cdot (|X| + 2) \cdot |\Sigma|^{2(k-2)f(|X|)} \leq \alpha^{f(|X|) + \log |X|}\] for some constant $\alpha$ that depends on the machine $N$. The constant $\alpha$ incorporates the constants $|Q|+2$, $2(k-2)$, and $|\Sigma|$.

The number of nodes in the configuration graph $G_{N,X}$ is bounded by $\alpha^{f(|X|) + \log |X|}$, where $\alpha$ is a constant depending on the machine $N$.

This bound on the number of configurations is crucial for establishing the time complexity of the deterministic machine that will decide the language $L$.

Simplifying the Graph

To facilitate our analysis, we aim to simplify the structure of the configuration graph $G_{N,X}$. Initially, the nodes of this graph represent configurations of the form: \[(q, U_1, V_1, U_2, V_2, \dots, U_{k-1}, V_{k-1}, W),\] where:

$q$ denotes the current state of the Turing machine $N$.
$U_1$ and $V_1$ represent the content of the input tape, with $U_1$ being the portion of the tape to the left of the head and $V_1$ being the portion to the right, including the symbol under the head.
$U_i$ and $V_i$ represent the contents of the $i$-th working tape, with $U_i$ being the portion of the tape to the left of the head and $V_i$ being the portion to the right, including the symbol under the head.
$W$ represents the content of the output tape.

We can simplify these configurations based on the following observations:

Output Tape Irrelevance: The content of the output tape is not relevant for our decision procedure. Since we are deciding a language, the output tape is only used to write the final result, and its content during the computation does not affect the decision. Therefore, we can disregard the output tape’s content. This simplifies the configuration to: \[(q, U_1, V_1, U_2, V_2, \dots, U_{k-1}, V_{k-1}).\]
Input Tape Immutability: In an input-output Turing machine, the input tape is read-only and cannot be modified. The content of the input tape remains constant throughout the computation. The input $X$ is split into $U_1$ and $V_1$, but their concatenation always forms the original input $X$ (with a possible blank at the end), i.e., $U_1V_1 = X\sqcup$. Instead of storing $U_1$ and $V_1$ explicitly, we can use an index $J$ to represent the position of the read head on the input tape. The index $J$ indicates how the input $X\sqcup$ is split into $U_1$ and $V_1$. The possible values for $J$ range from $0$ (when the head is at the beginning of $X$) to $|X|+1$ (when the head is at the blank symbol after $X$).

By applying these simplifications, we can represent a configuration in a more compact form: \[(q, J, U_2, V_2, \dots, U_{k-1}, V_{k-1}).\] This simplified representation retains all the necessary information to describe the state of the Turing machine during its computation, while reducing the complexity of the configuration representation. This simplification is crucial for analyzing the size of the configuration graph and, consequently, the time complexity of the deterministic machine that will decide the language $L$.

Simplification of a Turing Machine Configuration. The output tape is ignored, and the input tape content and head position are replaced by an index (J).

Figure 2 illustrates the simplification process of a Turing machine configuration.

Number of Possible Configurations

Now, we aim to determine the number of possible configurations, which corresponds to the number of nodes in the simplified graph $G_{N,X}$. This analysis is crucial for understanding the size of the graph and, consequently, the complexity of the reachability problem.

Let’s analyze the number of possible values for each component of a simplified configuration $(q, J, U_2, V_2, \dots, U_{k-1}, V_{k-1})$:

State $q$: The state $q$ can take on any of the states of the Turing machine $N$, plus two additional states: "yes" (accepting) and "no" (rejecting). Therefore, there are $|Q| + 2$ possible values for $q$, where $|Q|$ is the number of states in the machine $N$.
Index $J$: The index $J$ represents the position of the read head on the input tape. As discussed earlier, $J$ can range from $0$ to $|X| + 1$, where $|X|$ is the length of the input string $X$. Thus, there are $|X| + 2$ possible values for $J$.
Tape Contents $U_i$ and $V_i$: Each working tape has two parts, $U_i$ and $V_i$. The machine $N$ operates in space $f(n)$, where $n$ is the length of the input $X$. This means that the length of each $U_i$ and $V_i$ is at most $f(|X|)$. Since each tape symbol is drawn from the tape alphabet $\Sigma$, there are at most $|\Sigma|^{f(|X|)}$ possible choices for each $U_i$ and each $V_i$.
Number of Working Tapes: We have $k-2$ working tapes, each with two parts $U_i$ and $V_i$. Therefore, we have a total of $2(k-2)$ strings $U_i$ and $V_i$.

Combining these factors, the total number of possible configurations is at most: \[(|Q| + 2) \cdot (|X| + 2) \cdot (|\Sigma|^{f(|X|)}) ^{2(k-2)} = (|Q| + 2) \cdot (|X| + 2) \cdot |\Sigma|^{2(k-2)f(|X|)}.\] This expression can be simplified by noting that $|Q|$, $k$, and $|\Sigma|$ are constants that depend on the machine $N$ but not on the input $X$. We can rewrite the expression as: \[\alpha \cdot (|X| + 2) \cdot \beta^{f(|X|)},\] where $\alpha = |Q| + 2$ and $\beta = |\Sigma|^{2(k-2)}$ are constants.

Furthermore, we can express $|X|+2$ as $2^{\log(|X|+2)}$, and since $\log(|X|+2) \leq \log(2|X|) = \log 2 + \log |X| \leq 1 + \log |X|$, we can say that $|X|+2$ is less than or equal to $c \cdot 2^{\log |X|}$ for some constant $c$.

Thus, the total number of configurations can be bounded by: \[\alpha \cdot c \cdot 2^{\log |X|} \cdot \beta^{f(|X|)} \leq \gamma \cdot 2^{\log |X| + f(|X|)},\] for some constant $\gamma$.

Finally, we can rewrite this as: \[\gamma \cdot 2^{f(|X|) + \log |X|}\] for some constant $\gamma$. We can absorb the constant $\gamma$ into the base of the exponent, resulting in a bound of the form: \[\alpha^{f(|X|) + \log |X|},\] for some constant $\alpha$.

The number of possible configurations for a Turing machine $N$ on input $X$ working in space $f(|X|)$ is bounded by $\alpha^{f(|X|) + \log |X|}$ for some constant $\alpha$ that depends on the machine $N$.

This result is crucial because it provides an upper bound on the size of the configuration graph $G_{N,X}$, which is essential for analyzing the time complexity of solving the reachability problem on this graph.

Deterministic Machine for Deciding L

We now construct a deterministic Turing machine $M$ that decides the language $L$. This machine will leverage the graph $G_{N,X}$ and the reachability method. The machine $M$ operates as follows:

Input: The machine $M$ receives the input string $X$.
Graph Construction: $M$ constructs the configuration graph $G_{N,X}$ on one of its working tapes. This graph represents all possible configurations of the non-deterministic machine $N$ on input $X$, along with the transitions between them. The construction of this graph is deterministic and can be done by simulating the transitions of $N$. The transition function of $N$ is encoded in the transition function of $M$.
Reachability Check: $M$ then solves the reachability problem on the graph $G_{N,X}$. Specifically, it checks whether there exists a path from the initial configuration of $N$ on input $X$ to any configuration where the state is "yes" (an accepting configuration). This can be done using a standard graph traversal algorithm, such as Breadth-First Search (BFS) or Depth-First Search (DFS).

Input string $X$ Construct the initial configuration $C_0$ of $N$ on input $X$ Initialize the graph $G_{N,X}$ with node $C_0$ and set of edges $E$ to $\emptyset$ $Q \gets \{C_0\}$ $C \gets$ Dequeue($Q$) Generate all possible next configurations $C'$ from $C$ using the transition function of $N$ Add $C'$ to $G_{N,X}$ Enqueue($Q$, $C'$) Add edge $(C, C')$ to $E$ Accept $X$ Reject $X$

Algorithm [alg:deterministic_machine] provides a pseudocode description of the deterministic Turing machine $M$.

The reachability problem on a graph can be solved in polynomial time with respect to the size of the graph. For instance, BFS has a time complexity that is linear in the number of nodes and edges of the graph in a pseudocode implementation. However, on a Turing machine, the time complexity might be higher, but still polynomial in the size of the graph. Let’s assume the time complexity of solving reachability on a Turing machine is $O(n^\beta)$ where $n$ is the size of the graph and $\beta$ is some constant.

The overall time complexity of the deterministic machine $M$ is determined by two main factors:

The time required to construct the graph $G_{N,X}$. This is linear in the size of the graph, which is $O(\alpha^{f(|X|) + \log |X|})$.
The time required to solve the reachability problem on $G_{N,X}$. This is polynomial in the size of the graph, i.e., $O((\alpha^{f(|X|) + \log |X|})^\beta)$.

Therefore, the total time complexity of $M$ is: \[O(\alpha^{f(|X|) + \log |X|}) + O((\alpha^{f(|X|) + \log |X|})^\beta)\] Since the second term dominates the first term, the overall time complexity is: \[O((\alpha^{f(|X|) + \log |X|})^\beta) = O(\alpha^{\beta(f(|X|) + \log |X|)})\] We can rewrite this as: \[TIME(c^{f(n) + \log n})\] for some constant $c = \alpha^\beta$.

This demonstrates that the language $L$, which is decided by the non-deterministic machine $N$ in space $f(n)$, can also be decided by a deterministic machine $M$ in time $c^{f(n) + \log n}$ for some constant $c$. This proves the inclusion: \[NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n}).\]

Note that the constant $c$ depends on the machine $N$ (through $\alpha$ and $\beta$). Therefore, we need to take the union over all possible constants to account for different machines.

Discussion on the Statement

The statement presented in the book, $NSPACE(f(n)) \subseteq TIME(c^{f(n) + \log n})$, while capturing the essence of the result, has a subtle imprecision. The constant $c$ in the time complexity bound is not universal; it depends on the specific non-deterministic Turing machine $N$ that decides the language $L$ in space $f(n)$.

To elaborate, when we analyze the size of the configuration graph $G_{N,X}$, we encounter constants that are specific to the machine $N$, such as the number of states $|Q|$, the size of the tape alphabet $|\Sigma|$, and the number of working tapes $k$. These constants influence the value of $c$ in the time complexity bound. Specifically, the constant $\alpha$ in the bound $\alpha^{f(|X|) + \log |X|}$ on the number of configurations depends on these machine-specific constants.

Therefore, when we consider different languages within the same space complexity class $NSPACE(f(n))$, each language might have a different associated constant $c$. This means that the inclusion is not for a fixed $c$, but rather for a $c$ that depends on the specific machine.

The constant $c$ in the time complexity $TIME(c^{f(n) + \log n})$ depends on the specific non-deterministic Turing machine $N$ used to decide the language in space $f(n)$.

A more accurate and precise statement of the inclusion would be: \[NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n}).\] This statement indicates that for any language in $NSPACE(f(n))$, there exists a constant $c$ (dependent on the machine) such that the language is in $TIME(c^{f(n) + \log n})$. The union over all possible constants $c$ reflects the fact that the constant is not fixed across all languages in $NSPACE(f(n))$.

This distinction is important because changing the base of the exponential function can significantly alter the asymptotic behavior of the time complexity. For example, $3^{f(n)}$ and $4^{f(n)}$ are not asymptotically equivalent. Therefore, the union over all possible constants $c$ is necessary to accurately represent the relationship between the complexity classes.

The correct inclusion is: \[NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n}).\] This reflects the fact that the constant $c$ depends on the specific Turing machine used.

This corrected statement provides a more accurate representation of the relationship between non-deterministic space complexity and deterministic time complexity.

Model Checking and Verification Techniques

The reachability method we have just discussed is closely related to techniques used in model checking and dynamic verification of systems. These techniques share a common underlying principle: they transform the execution of a system into a graph representation and then analyze the properties of this graph. This approach is fundamental in the field of formal verification, where the goal is to ensure that systems behave correctly according to their specifications.

In model checking and dynamic verification, the graph represents all possible executions or behaviors of a system. Each node in the graph corresponds to a state of the system, and the edges represent transitions between states. This graph is often referred to as the state space of the system. The system can be a hardware circuit, a software program, or any other kind of computational system.

Dynamic verification techniques analyze a system by constructing a graph representing all possible executions and then verifying properties of this graph.

The analysis of this graph can involve various types of properties, such as:

Reachability: Determining whether a specific state or configuration is reachable from the initial state. This is the same problem we addressed in the proof of the inclusion $NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n})$.
Safety Properties: Verifying that the system never enters an undesirable or unsafe state. For example, ensuring that a variable never takes a negative value or that a critical section is never accessed by two processes simultaneously.
Liveness Properties: Ensuring that the system eventually reaches a desired state or performs a specific action. For example, ensuring that a request is eventually granted or that a process eventually terminates.
Temporal Properties: Analyzing the behavior of the system over time, often using temporal logics. These logics allow us to express properties that involve sequences of states and transitions, such as "always," "eventually," or "until."

The key idea is to represent the semantics of the system as a graph and then use graph algorithms and analysis techniques to verify properties of the system. This approach is particularly useful for analyzing complex systems where it is difficult to reason about their behavior directly.

The reachability method we used to prove the complexity class inclusion is a specific instance of this general approach. We constructed the graph of all possible configurations of a Turing machine and then used reachability to determine whether the input was accepted by the machine. This demonstrates the broad applicability of the graph-based approach in computer science.

State Space Explosion Problem

A major challenge in model checking and dynamic verification is the state space explosion problem. As the complexity of the system increases, the size of the state space graph can grow exponentially. This can make it computationally infeasible to store and analyze the entire graph. Techniques such as symbolic model checking, abstraction, and on-the-fly verification are used to mitigate this problem. On-the-fly model checking is analogous to the way we handled the graph $G_{N,X}$ in the proof of Savitch’s theorem: we did not generate the whole graph, but only the parts that were needed during the computation.

Static vs. Dynamic Verification

It is important to distinguish between dynamic and static verification techniques:

Dynamic Verification: Constructs the graph of all possible executions (semantics) and analyzes its properties. Model checking is a typical example of dynamic verification.
Static Verification: Analyzes the system based on its syntax without constructing the execution graph. Type checking is a typical example of static verification.

Static verification techniques are often more efficient but may not be able to detect all types of errors, while dynamic verification techniques are more comprehensive but can be computationally expensive.

Comparison of Dynamic and Static Verification Techniques.

Figure 3 illustrates the difference between dynamic and static verificationverification techniques.

Abstract Interpretation

Abstract interpretation is a technique that combinesaspects of both static and dynamic verification. It involves creating an abstract representation of the system’s state space and analyzing this abstraction to verify properties of the system. This can help to reduce the complexity of the analysis while still providing useful information about the system’s behavior.

State Space Explosion Problem

A significant challenge in model checking and dynamic verification techniques is the state space explosion problem. This problem arises because the graph representing all possible executions of a system can grow exponentially with the size or complexity of the system. The number of states in the graph can become so large that it becomes impossible to store and analyze the entire graph, even with modern computing resources.

As we have seen, the number of configurations of a Turing machine can be very large, even for relatively simple machines and inputs. In the context of model checking, the state space of a system can become enormous, making it impossible to store the entire graph in memory. This is particularly true for systems with many concurrent processes or complex interactions, where the number of possible interleavings of actions can lead to an exponential increase in the number of states.

The state space explosion problem limits the applicability of these techniques to relatively small or simplified systems. When the graph becomes too large, it becomes computationally infeasible to analyze it using standard graph algorithms. This is because the memory requirements and the time complexity of the algorithms become prohibitive. For example, a simple system with a few variables, each with a few possible values, can easily have millions or billions of possible states.

To address the state space explosion problem, researchers have developed various techniques, including:

Symbolic Model Checking: Using symbolic representations of the state space, such as Binary Decision Diagrams (BDDs), to represent and manipulate large sets of states efficiently. Instead of explicitly enumerating each state, symbolic model checking represents sets of states using logical formulas, allowing for more compact representations.
Abstraction Techniques: Simplifying the system by abstracting away irrelevant details, reducing the size of the state space. Abstraction involves creating a simplified model of the system that preserves the properties of interest while reducing the number of states.
Partial Order Reduction: Exploiting the independence of concurrent actions to reduce the number of states that need to be explored. If two actions do not interfere with each other, their order of execution does not matter, and only one interleaving needs to be considered.
On-the-Fly Model Checking: Generating only the necessary parts of the state space during the verification process, rather than constructing the entire graph upfront. This approach avoids generating states that are not relevant to the property being verified. This is similar to how we handled the graph $G_{N,X}$ when proving Savitch’s theorem. We did not generate the entire graph but only the parts that were needed on demand.

Techniques like symbolic model checking, abstraction, partial order reduction, and on-the-fly model checking are used to mitigate the state space explosion problem.

These techniques aim to mitigate the impact of the state space explosion problem and make model checking and dynamic verification applicable to larger and more complex systems. However, the state space explosion problem remains a fundamental challenge in the field, and researchers continue to develop new techniques to address it.

The State Space Explosion Problem and Mitigation Techniques.

Figure 4 illustrates the state space explosion problem and the use of mitigation techniques.

Static Verification Techniques

In contrast to model checking and dynamic verification, which analyze the semantics of a system by constructing and exploring its state space graph, **static verification techniques** analyze the syntax of a program without executing it or building its full state space. These techniques operate directly on the source code or an intermediate representation of the program, such as an Abstract Syntax Tree (AST). Static verification aims to identify potential errors and verify properties of the program by examining its structure and code, rather than its runtime behavior.

Static verification techniques analyze the syntax of a program without execution, aiming to identify potential errors and verify program properties.

A prominent example of a static verification technique is type checking. Type checking involves verifying that the types of variables and expressions in a program are consistent with the rules of the programming language. For instance, a type checker would detect an error if an integer value were assigned to a boolean variable without an explicit type cast. Type checking ensures that operations are performed on compatible data types, preventing many common programming errors.

Other examples of static verification techniques include:

Dataflow Analysis: Analyzing how data flows through a program to detect potential errors, such as uninitialized variables or unused values. Dataflow analysis tracks the flow of data through the program’s control flow graph, identifying potential issues related to variable usage.
Abstract Interpretation: Approximating the behavior of a program to verify properties, such as the range of values that a variable can take. Abstract interpretation uses abstract domains to represent sets of concrete values, allowing for the analysis of program properties without explicitly executing the program.
Formal Methods: Using mathematical techniques to specify and verify program properties. This can involve creating formal specifications of program behavior and using theorem provers or other formal methods tools to verify that the program meets its specification.
Static Analysis Tools: Using automated tools to analyze the source code for potential bugs, security vulnerabilities, or coding style violations. These tools can identify issues such as buffer overflows, SQL injection vulnerabilities, and adherence to coding standards. Examples include tools like FindBugs, PMD, and Coverity.

Static verification techniques offer several advantages over dynamic verification:

Efficiency: Static analysis is generally more efficient than dynamic analysis because it does not require executing the program. This allows for faster analysis and feedback during the development process.
Early Bug Detection: Static analysis can detect errors early in the development process, before the program is executed. This can save time and resources by identifying issues before they become more difficult to fix.
Coverage: Static analysis can analyze all possible execution paths of a program, whereas dynamic analysis is limited to the specific execution paths that are tested. This allows for a more comprehensive analysis of the program’s behavior.

However, static verification techniques also have limitations:

Approximation: Static analysis often relies on approximations of the program’s behavior, which can lead to false positives (reporting errors that do not exist) or false negatives (missing errors that do exist). The approximations are necessary to make the analysis tractable but can result in inaccuracies.
Limited Scope: Some types of errors, such as division by zero or out-of-bounds array access, may not be detectable by static analysis alone. These errors often depend on runtime values and are difficult to predict without executing the program. However, combining static analysis with other techniques like symbolic execution can help detect such errors.
Complexity: Some static analysis techniques can be complex and difficult to implement. Developing accurate and efficient static analysis tools can be challenging.

Static and dynamic verification techniques are complementary approaches to ensuring the correctness and reliability of software systems. Static analysis is often used to detect common errors early in the development process, while dynamic analysis is used to verify more complex properties and to test the system under realistic conditions. Combining both approaches can lead to more robust and reliable software.

Static Verification Techniques and their application to source code.

Figure 5 illustrates the application of static verification techniques to source code.

Corollary and Hierarchy Theorems

Based on the relationships between complexity classes that we have established, we can construct a hierarchy of complexity classes, ordered by inclusion: \[L \subseteq NL \subseteq P \subseteq NP \subseteq PSPACE = NPSPACE \subseteq EXP \subseteq NEXP \subseteq \dots\] Here:

$L$ represents deterministic logarithmic space.
$NL$ represents non-deterministic logarithmic space.
$P$ represents deterministic polynomial time.
$NP$ represents non-deterministic polynomial time.
$PSPACE$ represents deterministic polynomial space.
$NPSPACE$ represents non-deterministic polynomial space.
$EXP$ represents deterministic exponential time, i.e., $\bigcup_{k \in \mathbb{N}} TIME(2^{n^k})$.
$NEXP$ represents non-deterministic exponential time.

This chain of inclusions shows how different complexity classes relate to each other in terms of their computational power. The inclusions are derived from the relationships we have proven so far, such as the inclusion of $NL$ in $P$ using the reachability method and the inclusion of $NSPACE(f(n))$ in $\bigcup_{c>0}TIME(c^{f(n) + \log n})$. Savitch’s theorem will further demonstrate that $NPSPACE = PSPACE$.

The hierarchy of complexity classes is: \[L \subseteq NL \subseteq P \subseteq NP \subseteq PSPACE = NPSPACE \subseteq EXP \subseteq NEXP \subseteq \dots\]

Furthermore, we can invoke the hierarchy theorems to establish strict separations between some of these classes:

Time Hierarchy Theorem: The time hierarchy theorem states that for any time-constructible function $t(n)$, there exists a language that can be decided in time $t(n)$ but not in time $o(\frac{t(n)}{\log t(n)})$. As a consequence, we know that $P \subsetneq EXP$. This means that there are problems that can be solved in exponential time but cannot be solved in polynomial time. The strict inclusion $P \subsetneq EXP$ implies that at least one of the inclusions in the hierarchy is strict.
Space Hierarchy Theorem: Similarly, the space hierarchy theorem states that for any space-constructible function $s(n)$, there exists a language that can be decided in space $s(n)$ but not in space $o(s(n))$. As a consequence, we know that $L \subsetneq PSPACE$. This means that there are problems that can be solved in polynomial space but cannot be solved in logarithmic space. The strict inclusion $L \subsetneq PSPACE$ implies that at least one of the inclusions in the hierarchy is strict.

These hierarchy theorems demonstrate that the inclusions in the hierarchy of complexity classes are not all equalities. At least some of the inclusions are strict, meaning that there are languages in the larger class that are not in the smaller class. For example, we know that $P$ is strictly contained in $EXP$, and $L$ is strictly contained in $PSPACE$.

Time Hierarchy Theorem: $P \subsetneq EXP$
Space Hierarchy Theorem: $L \subsetneq PSPACE$

However, many other questions about the relationships between these classes remain open. For example, it is not known whether $P = NP$ or whether $L = NL$. These are some of the most important open problems in theoretical computer science. The question of whether $P=NP$ is particularly famous, and its resolution would have profound implications for many areas of computer science and mathematics. Similarly, the question of whether $L=NL$ is a fundamental open problem in complexity theory. If $P$ were equal to $NP$, then at least one of the inclusions between $P$ and $EXP$ would have to be strict, and at least one of the inclusions between $L$ and $PSPACE$ would have to be strict.

Hierarchy of Complexity Classes. An arrow from class A to class B indicates that A is a subset of B. A double line indicates that the two classes are equal.

Figure 6 illustrates the hierarchy of complexity classes.

Savitch’s Theorem

Savitch’s theorem is a fundamental result in complexity theory that provides an upper bound on the space complexity of solving the reachability problem in directed graphs. Specifically, Savitch’s theorem states that reachability can be solved in deterministic space $O(\log^2 n)$, where $n$ is the number of nodes in the graph.

To understand Savitch’s theorem, we first define a predicate that will be used in the proof:

Let $G = (V, E)$ be a directed graph. We define the predicate $Path(X, Y, I)$ to be true if and only if there exists a path from node $X$ to node $Y$ in $G$ with a length of at most $2^I$.

Using this predicate, we can express the reachability problem as follows:

Given two nodes $U$ and $V$ in the graph $G$, $U$ reaches $V$ if and only if $Path(U, V, \log |V|)$ is true. This is because if there is a path from $U$ to $V$, then there is a path of length at most $|V|$ (since a simple path cannot visit any node more than once), and since $2^{\log |V|} = |V|$, the predicate $Path(U, V, \log |V|)$ will be true.

Lemma 2. $U$ reaches $V$ in $G$ if and only if $Path(U, V, \log |V|)$ is true.

Proof. Proof. If there is a path from $U$ to $V$ in $G$, then there is a path of length at most $|V|$. Since $2^{\log |V|} = |V|$, the predicate $Path(U, V, \log |V|)$ is true. Conversely, if $Path(U, V, \log |V|)$ is true, then there is a path from $U$ to $V$ of length at most $2^{\log |V|} = |V|$, implying that $U$ reaches $V$. ◻

The key idea behind Savitch’s theorem is to define a recursive relationship for the $Path$ predicate:

Lemma 3. $Path(X, Y, I)$ is true if and only if there exists a node $Z$ such that $Path(X, Z, I-1)$ and $Path(Z, Y, I-1)$ are true.

Proof. Proof. If $Path(X, Y, I)$ is true, then there is a path from $X$ to $Y$ of length at most $2^I$. We can split this path into two subpaths of length at most $2^{I-1}$. Let $Z$ be a node on the path such that the path from $X$ to $Z$ and the path from $Z$ to $Y$ have length at most $2^{I-1}$. Then $Path(X, Z, I-1)$ and $Path(Z, Y, I-1)$ are true. Conversely, if there exists a node $Z$ such that $Path(X, Z, I-1)$ and $Path(Z, Y, I-1)$ are true, then there is a path from $X$ to $Z$ of length at most $2^{I-1}$ and a path from $Z$ to $Y$ of length at most $2^{I-1}$. Concatenating these paths gives a path from $X$ to $Y$ of length at most $2^{I-1} + 2^{I-1} = 2^I$, so $Path(X, Y, I)$ is true. ◻

This recursive definition allows us to break down the problem of finding a path of length at most $2^I$ into two subproblems of finding paths of length at most $2^{I-1}$.

A deterministic Turing machine can decide the $Path(X, Y, I)$ predicate using a recursive algorithm that implements this definition. The space used by the Turing machine is determined by the depth of the recursion and the amount of information that needs to be stored at each level of the recursion. The depth of the recursion is at most $\log |V|$, since $I$ starts at $\log |V|$ and decreases by 1 at each recursive step until it reaches 0. At each level, we need to store the nodes $X$, $Y$, and $Z$, as well as the index $I$. The space required to store each of these is $O(\log |V|)$ (since we need to store node indices and the index $I$, which are all bounded by $|V|$). Therefore, the total space used by the Turing machine is $O(\log |V| \cdot \log |V|) = O(\log^2 |V|)$.

true false true false

Algorithm [alg:path_recursive] provides a pseudocode description of the recursive algorithm for the $Path(X, Y, I)$ predicate.

Theorem 4 (Savitch’s Theorem). The reachability problem can be solved in deterministic space $O(\log^2 n)$, where $n$ is the number of nodes in the graph.

Proof. Proof. The algorithm for deciding reachability is to compute $Path(U, V, \log |V|)$ using the recursive algorithm described above (Algorithm [alg:path_recursive]). The space complexity of this algorithm is $O(\log^2 |V|)$, as explained in the text. The algorithm needs to store the current recursion depth, which is at most $\log |V|$, and for each level of recursion, it needs to store the indices of at most three nodes, $X$, $Y$, and $Z$, each requiring $O(\log |V|)$ space. Thus, the total space used is $O(\log |V| \cdot \log |V|) = O(\log^2 |V|)$. ◻

This demonstrates that the reachability problem can be solved in deterministic space $O(\log^2 n)$, where $n$ is the number of nodes in the graph.

The reachability problem can be solved in deterministic space $O(\log^2 n)$ according to Savitch’s Theorem.

It is important to note that while this algorithm is space-efficient, it is not time-efficient. The recursive calls can lead to redundant computations, making the algorithm inefficient in terms of time complexity. The time complexity of Savitch’s algorithm is $n^{O(\log n)}$, which is quasi-polynomial, not polynomial.

Corollary of Savitch’s Theorem

A significant consequence of Savitch’s theorem is the following inclusion: \[NSPACE(f(n)) \subseteq SPACE(f(n)^2)\] This inclusion holds for any proper function $f(n)$ such that $f(n) \geq \log n$. This result demonstrates that non-deterministic space complexity is not significantly more powerful than deterministic space complexity, as the non-deterministic space can be simulated by a deterministic machine with only a quadratic increase in space usage.

To prove this corollary, we consider a language $L$ that is in $NSPACE(f(n))$. This means that there exists a non-deterministic Turing machine $N$ that decides $L$ using space $f(n)$. We can construct the configuration graph $G_{N,X}$ for $N$ on input $X$, as we did in the proof of the inclusion $NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n})$.

The size of this graph, as we have shown, is bounded by $\alpha^{f(|X|) + \log |X|}$ for some constant $\alpha$. To decide whether $X$ is in $L$, we need to solve the reachability problem on this graph. According to Savitch’s theorem, reachability can be solved in deterministic space that is the square of the logarithm of the number of nodes in the graph.

The logarithm of the size of the graph is: \[\log(\alpha^{f(|X|) + \log |X|}) = (f(|X|) + \log |X|) \log \alpha = O(f(|X|) + \log |X|).\] Since $f(n) \geq \log n$, we have $f(|X|) + \log |X| = O(f(|X|))$. Therefore, the space required to solve reachability on this graph is: \[O((f(|X|) + \log |X|)^2) = O(f(|X|)^2).\] This means that we can decide the language $L$ using a deterministic Turing machine that uses space $O(f(n)^2)$.

However, there is a crucial detail to consider. In our previous discussion, we mentioned that the graph $G_{N,X}$ is not explicitly constructed and stored on a tape. Instead, the graph is generated on-the-fly, meaning that only the necessary parts of the graph are generated when needed during the reachability computation. This is important because if we were to store the entire graph, the space complexity would be much higher. The space used to store the graph could be exponential in $f(n)$, but by generating it on-the-fly, we avoid this issue.

The configuration graph $G_{N,X}$ is not explicitly constructed and stored on a tape. Instead, it is generated on-the-fly during the reachability computation. This means that only the parts of the graph that are needed for the computation are generated and stored at any given time.

By generating the graph on-the-fly, we can ensure that the space used by the deterministic machine is indeed $O(f(n)^2)$. This proves the inclusion: \[NSPACE(f(n)) \subseteq SPACE(f(n)^2).\] This result highlights that non-determinism in space complexity does not provide a significant advantage over determinism, as the non-deterministic computation can be simulated deterministically with only a quadratic increase in space.

Theorem 5. $NSPACE(f(n)) \subseteq SPACE(f(n)^2)$ for any proper function $f(n) \geq \log n$.

Proof. Proof. Let $L \in NSPACE(f(n))$. Then there is a non-deterministic Turing machine $N$ that decides $L$ in space $f(n)$. We can construct the configuration graph $G_{N,X}$ of $N$ on input $X$. The size of $G_{N,X}$ is at most $\alpha^{f(|X|) + \log |X|}$ for some constant $\alpha$. Using Savitch’s theorem, we can solve the reachability problem on $G_{N,X}$ in space $O(\log^2 |G_{N,X}|) = O((f(|X|) + \log |X|)^2) = O(f^2(|X|))$ since $f(n) \geq \log n$. The graph is generated on-the-fly, so the space used is dominated by the space needed for the reachability algorithm, which is $O(\log^2 |G_{N,X}|)$. ◻

This result is a direct consequence of Savitch’s theorem and the on-the-fly generation of the configuration graph.

Corollary 6. $PSPACE = NPSPACE$

Proof. Proof. We know that $PSPACE \subseteq NPSPACE$ by definition. To show that $NPSPACE \subseteq PSPACE$, we use the fact that $NPSPACE = \bigcup_{k} NSPACE(n^k)$. By Theorem 5, $NSPACE(n^k) \subseteq SPACE(n^{2k})$. Since $n^{2k}$ is also a polynomial, we have $\bigcup_{k} NSPACE(n^k) \subseteq \bigcup_{k} SPACE(n^{2k}) \subseteq PSPACE$. Therefore, $NPSPACE \subseteq PSPACE$, and thus $PSPACE = NPSPACE$. ◻

Open Problem: L vs NL

We have established that the complexity classes $L$ (deterministic logarithmic space) and $NL$ (non-deterministic logarithmic space) are related by the following inclusions: \[L \subseteq NL \subseteq \text{SPACE}(\log^2 n)\] The inclusion $L \subseteq NL$ is straightforward, as any deterministic computation can be viewed as a special case of a non-deterministic computation. The inclusion $NL \subseteq \text{SPACE}(\log^2 n)$ follows directly from Savitch’s theorem, which states that reachability can be solved in deterministic space $O(\log^2 n)$.

We have: \[L \subseteq NL \subseteq \text{SPACE}(\log^2 n)\]

However, the question of whether $L$ is equal to $NL$ remains one of the most important open problems in complexity theory. Formally, this is expressed as: \[L \stackrel{?}{=} NL\] This question asks whether the power of non-determinism provides a significant advantage in the context of logarithmic space complexity. In other words, can every problem that can be solved by a non-deterministic Turing machine in logarithmic space also be solved by a deterministic Turing machine in logarithmic space?

A crucial connection between the $L$ vs $NL$ problem and the reachability problem is the equivalence: \[L = NL \iff \text{there exists a deterministic log-space algorithm for reachability.}\] This equivalence implies that finding such an algorithm would prove $L = NL$. Conversely, proving that no deterministic log-space algorithm exists would show $L \neq NL$.

The question of whether $L=NL$ is equivalent to the existence of a deterministic log-space algorithm for reachability.

The reachability problem is, therefore, a central problem for understanding the relationship between deterministic and non-deterministic logarithmic space. The fact that we do not yet know whether $L = NL$ highlights the limitations of our current understanding of the power of non-determinism in space complexity.

The $L$ vs $NL$ problem is a fundamental open question in complexity theory, and its resolution would have significant implications for our understanding of the limits of computation.

This open problem, though less famous than the $P$ vs $NP$ problem, is equally significant in the field of complexity theory. Resolving it would profoundly impact our understanding of the fundamental limits of computation.

Reachability in Undirected Graphs

An interesting fact related to this open problem is that reachability in undirected graphs can be solved in deterministic logarithmic space. However, the same is not known for directed graphs. This difference between undirected and directed graphs is a unique case in complexity theory, where the same problem has different known complexities depending on whether the graph is directed or undirected. This fact further emphasizes the complexity of the $L$ vs $NL$ problem.

Reachability in undirected graphs can be solved in deterministic logarithmic space, while the same is not known for directed graphs.

This difference highlights the subtle complexities involved in understanding the relationship between deterministic and non-deterministic logarithmic space. It is known that if one could provide an algorithm to solve directed graph reachability in deterministic logarithmic space, then $L=NL$. On the other hand, if one could prove that no such algorithm exists, then $L \neq NL$.

Reachability on Undirected Graphs

An interesting fact related to the $L$ vs $NL$ problem is that the reachability problem on undirected graphs can be solved in deterministic logarithmic space. This is in contrast to the general reachability problem on directed graphs, for which the best known deterministic algorithm requires $O(\log^2 n)$ space (Savitch’s theorem).

Reachability in undirected graphs is in $L$, while the best known deterministic algorithm for directed graphs requires $O(\log^2 n)$ space.

The fact that reachability on undirected graphs is in $L$ is a non-trivial result and highlights a difference in the complexity of reachability between directed and undirected graphs. This result is somewhat surprising, as it might seem that the direction of edges should not make a significant difference in the complexity of the problem. However, the absence of direction in undirected graphs allows for more efficient algorithms to be used.

The algorithm for solving reachability on undirected graphs in deterministic log-space is not straightforward and involves a clever use of graph traversal techniques. The details of this algorithm, known as Reingold’s algorithm, are beyond the scope of this discussion, but it is important to note that such an algorithm exists. The algorithm typically involves exploring the graph using a depth-first search strategy, while maintaining only a logarithmic amount of information on thestack. It leverages the property that in an undirected graph, if there is a path from $u$ to $v$, then there is also a path from $v$ to $u$.

The difference in complexity between reachability in directed and undirected graphs is a unique case in complexity theory.

This difference underscores the complexity of the $L$ vs $NL$ problem and highlights the subtle ways in which the structure of the input can affect the complexity of a problem.

Comparison of Reachability: Directed vs. Undirected Graphs.

Figure 7 illustrates the difference in complexity for reachability in directed and undirected graphs.

Non-determinism in Space Complexity

Based on the results we have discussed so far, it appears that non-determinism in space complexity does not provide a significant advantage over determinism, especially when considering higher complexity classes. This is in contrast to time complexity, where non-determinism can potentially lead to exponential speedups (as exemplified by the P vs NP problem).

Specifically, we have shown that:

$NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n})$: A non-deterministic computation using space $f(n)$ can be simulated by a deterministic computation in time that is exponential in $f(n)$.
$NSPACE(f(n)) \subseteq SPACE(f(n)^2)$: A non-deterministic computation using space $f(n)$ can be simulated by a deterministic computation using space that is quadratic in $f(n)$.

These results indicate that the power of non-determinism in space complexity is limited. While non-determinism can sometimes lead to more concise algorithms, it does not seem to fundamentally alter the space complexity of problems, at least not beyond a quadratic factor.

Non-determinism in space complexity does not provide a significant advantage over determinism, especially for higher complexity classes.

In the case of time complexity, non-determinism can potentially lead to exponential speedups, as evidenced by the $P$ vs $NP$ problem. However, in the case of space complexity, the difference between deterministic and non-deterministic space is much smaller. This is because space can be reused, while time cannot.

The only known case where non-determinism in space complexity might make a difference is in the lower complexity classes, such as $L$ and $NL$. We do not know whether $L = NL$, and this is one of the major open problems in complexity theory. However, even in this case, the difference is limited to a logarithmic factor, as $NL \subseteq \text{SPACE}(\log^2 n)$.

Non-determinism in space complexity might make a difference in lower complexity classes like $L$ and $NL$, but the difference is at most a quadratic factor in general, and unknown for $L$ vs $NL$.

When we move to higher complexity classes, such as $PSPACE$ and $NPSPACE$, we have the important result that $PSPACE = NPSPACE$. This means that non-determinism does not provide any additional power in the context of polynomial space complexity. This equality is a consequence of Savitch’s theorem, as $NPSPACE = \bigcup_k NSPACE(n^k) \subseteq \bigcup_k SPACE((n^k)^2) = \bigcup_k SPACE(n^{2k}) = PSPACE$. The reverse inclusion $PSPACE \subseteq NPSPACE$ is trivial.

$PSPACE = NPSPACE$, meaning that non-determinism does not provide any additional power in the context of polynomial space complexity.

Therefore, it seems that non-determinism in space complexity is not as powerful as non-determinism in time complexity. While non-determinism can be a useful tool for designing algorithms, it does not seem to fundamentally alter the space complexity of problems, at least not beyond a quadratic factor. The quadratic increase in space usage is a direct consequence of Savitch’s theorem.

Furthermore, next time we will see that $NSPACE(f(n)) = coNSPACE(f(n))$, meaning that the complements of non-deterministic space complexity classes are equal to the classes themselves. This is another indication that non-determinism in space is not very powerful.

Complementary Classes

We now turn our attention to complementary complexity classes. Given a complexity class $C$, the complementary class $coC$ is defined as the set of all languages whose complements are in $C$. Formally: \[coC = \{ L \mid \overline{L} \in C \}\] where $\overline{L}$ denotes the complement of the language $L$. The complement of a language $L$ is the set of all strings that are not in $L$.

The complementary class $coC$ is defined as the set of all languages whose complements are in $C$. \[coC = \{ L \mid \overline{L} \in C \}\]

Understanding the relationship between a complexity class and its complement can provide valuable insights into the nature of the problems within that class. For deterministic complexity classes, the relationship is straightforward. For example, if a deterministic Turing machine can decide a language $L$ within a certain time or space bound, then by simply flipping the accepting and rejecting states, we can construct a machine that decides the complement $\overline{L}$ within the same bounds. Therefore, for deterministic complexity classes such as $L$, $P$, $PSPACE$, and $EXP$, we have: \[coL = L\] \[coP = P\] \[coPSPACE = PSPACE\] \[coEXP = EXP\] This means that if a problem can be solved efficiently by a deterministic machine, then its complement can also be solved efficiently by a deterministic machine within the same resource bounds.

Deterministic complexity classes are closed under complementation, i.e., $coC = C$ for deterministic classes $C$.

However, the situation is more complex for non-deterministic complexity classes. Consider the class $NP$. A language $L$ is in $NP$ if there exists a non-deterministic Turing machine that accepts $L$ in polynomial time. If we simply flip the accepting and rejecting states of such a machine, the resulting machine might not correctly decide the complement $\overline{L}$. This is because the non-deterministic machine accepts if at least one computation path leads to an accepting state. Flipping the states means that the new machine will accept if none of the original computation paths accepted, which doesn’t directly correspond to a polynomial-time acceptance of $\overline{L}$ by a non-deterministic machine.

The class of languages whose complements are in $NP$ is denoted by $coNP$. A language $L$ is in $coNP$ if and only if its complement $\overline{L}$ is in $NP$. This means that there exists a non-deterministic Turing machine that accepts $\overline{L}$ in polynomial time.

One of the major open problems in complexity theory is the relationship between $NP$ and $coNP$: \[NP \stackrel{?}{=} coNP\] If $NP = coNP$, it would imply a certain symmetry in the difficulty of proving the existence and non-existence of solutions. Most researchers believe that $NP \neq coNP$, suggesting that proving that a solution does not exist is inherently harder than proving that a solution exists for problems in $NP \setminus coNP$.

The relationship between $NP$ and $coNP$ is a major open problem in complexity theory: \[NP \stackrel{?}{=} coNP\]

Similarly, we can define $coNL$ as the class of languages whose complements are in $NL$. A language $L$ is in $coNL$ if and only if its complement $\overline{L}$ is in $NL$. A significant result in this context is that: \[NL = coNL\] This result, known as the Immerman–Szelepcsényi theorem, is a non-trivial finding and demonstrates that non-deterministic logarithmic space is closed under complementation. The proof of this theorem is intricate and relies on a clever inductive argument to count the number of reachable states in a graph. This theorem provides a rare instance where a non-deterministic complexity class is known to be closed under complementation.

The Immerman-Szelepcsényi theorem states that $NL = coNL$, demonstrating that non-deterministic logarithmic space is closed under complementation.

As we saw before, $PSPACE = NPSPACE$. Since $PSPACE$ is a deterministic class, it follows that $PSPACE = coPSPACE$. Therefore, $NPSPACE = coNPSPACE$. In general, for a proper function $f(n) \geq \log n$, we have: \[NSPACE(f(n)) = coNSPACE(f(n))\] We will prove this result in the next lecture.

In summary, while deterministic complexity classes are trivially closed under complementation, the situation for non-deterministic classes is more nuanced and leads to interesting open problems and non-trivial results. The relationship between a complexity class and its complement offers a different perspective on the inherent difficulty of computational problems.

The Landscape of Complexity Classes

Putting together the relationships and theorems we have discussed, we can paint a more complete picture of the landscape of complexity classes and the connections between them. We have established the following inclusions: \[\begin{aligned} L &\subseteq NL \subseteq P \subseteq NP \subseteq PSPACE = NPSPACE \\ PSPACE &\subseteq EXP \subseteq NEXP \end{aligned}\] We also have the following equalities for complementary classes: \[\begin{aligned} L &= coL \\ P &= coP \\ PSPACE &= coPSPACE = NPSPACE = coNPSPACE \\ NL &= coNL \end{aligned}\] And we suspect, but haven’t proven, that: \[\begin{aligned} P &\neq NP \\ L &\neq NL \\ NP &\neq coNP \end{aligned}\]

We have the following inclusions: \[L \subseteq NL \subseteq P \subseteq NP \subseteq PSPACE = NPSPACE \subseteq EXP \subseteq NEXP\] And the following equalities for complementary classes: \[\begin{aligned} L &= coL \\ P &= coP \\ PSPACE &= coPSPACE = NPSPACE = coNPSPACE \\ NL &= coNL \end{aligned}\]

These inclusions and suspected separations define the basic structure of the complexity class hierarchy. The classes at the bottom, like $L$, represent problems solvable with very limited resources, while classes at the top, like $NEXP$, represent problems that require significant computational resources.

The equalities $PSPACE = NPSPACE$ and $NL = coNL$ are particularly interesting as they show that non-determinism does not add any power in the context of polynomial space and that non-deterministic logarithmic space is closed under complementation.

The open problems, particularly $P$ vs $NP$ and $L$ vs $NL$, are central to our understanding of the limits of computation. Resolving these questions would have profound implications for computer science and mathematics. For instance, if $P = NP$, many currently intractable problems would become efficiently solvable. Conversely, if $P \neq NP$, it would solidify the belief that certain computational tasks are inherently hard.

Beyond the classes mentioned above, there are other important complexity classes, such as:

EXP (Exponential Time): Problems solvable by a deterministic Turing machine in time $2^{p(n)}$ for some polynomial $p(n)$.
NEXP (Non-deterministic Exponential Time): Problems solvable by a non-deterministic Turing machine in time $2^{p(n)}$ for some polynomial $p(n)$.
EXPSPACE (Exponential Space): Problems solvable by a deterministic Turing machine in space $2^{p(n)}$ for some polynomial $p(n)$.

EXP: Problems solvable in deterministic exponential time.
NEXP: Problems solvable in non-deterministic exponential time.
EXPSPACE: Problems solvable in deterministic exponential space.

The hierarchy theorems tell us that: \[P \subsetneq EXP\] \[PSPACE \subsetneq EXPSPACE\] And it is known that: \[P \subseteq NP \subseteq NEXP\] \[L \subseteq NL \subseteq P \subseteq PSPACE \subseteq EXP \subseteq NEXP \subseteq EXPSPACE\]

A major open problem related to these classes is: \[NP \stackrel{?}{=} NEXP\] It is widely believed that $NP \neq NEXP$, which would imply that adding exponential time to a non-deterministic machine gives it more computational power.

The relationship between $NP$ and $NEXP$ is another major open problem: \[NP \stackrel{?}{=} NEXP\]

The relationships between these complexity classes form a rich and intricate structure. Understanding this structure is crucial for classifying the difficulty of computational problems and for guiding the development of efficient algorithms. While significant progress has been made in complexity theory, many fundamental questions remain unanswered, driving ongoing research in the field.

The Landscape of Complexity Classes.

Figure 8 provides a visual representation of the landscape of complexity classes.

Conclusion

In this discussion, we have explored several key relationships between complexity classes, focusing on the inclusion $NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n})$ and its implications. We delved into the reachability method, a powerful technique for proving such inclusions, and discussed its connections to model checking and verification techniques. We also highlighted the challenges posed by the state space explosion problem and contrasted dynamic verification with static verification techniques.

Reachability Method
Configuration Graphs
Model Checking and Verification
State Space Explosion Problem
Static vs. Dynamic Verification

We examined fundamental theorems like Savitch’s theorem and its corollary, which shed light on the relationship between deterministic and non-deterministic space complexity. The open problem of $L$ vs $NL$ was discussed, emphasizing its importance in understanding the power of non-determinism in logarithmic space. We also touched upon the concept of complementary complexity classes, highlighting the significance of the Immerman–Szelepcsényi theorem ($NL = coNL$).

Savitch’s Theorem: Reachability in $O(\log^2 n)$ space.
$NSPACE(f(n)) \subseteq SPACE(f(n)^2)$
$NL = coNL$ (Immerman-Szelepcsényi Theorem)
$PSPACE = NPSPACE$

The landscape of complexity classes presents a fascinating and challenging area of study. While we have mapped out many of the relationships between these classes, several fundamental questions remain open, driving ongoing research in theoretical computer science. The quest to understand the power and limitations of computation continues, with each new insight building upon the foundational work we have explored. The journey through complexity theory reveals the deep and often surprising connections between seemingly disparate computational models and problems, underscoring the elegance and complexity of the computational universe. We also learned that non-determinism in space is less powerful than non-determinism in time and that $NSPACE(f(n)) = coNSPACE(f(n))$ for $f(n) \geq \log n$.

$P \stackrel{?}{=} NP$
$L \stackrel{?}{=} NL$
$NP \stackrel{?}{=} coNP$
$NP \stackrel{?}{=} NEXP$

These open problems represent some of the most important challenges in theoretical computer science, and their resolution would have a profound impact on our understanding of computation.

In the next lecture, we will continue our exploration of complexity classes by proving the result $NSPACE(f(n)) = coNSPACE(f(n))$ for $f(n) \geq \log n$. This will involve developing a non-deterministic space-efficient algorithm for counting the number of nodes reachable from a given node in a graph.

--- title: "Complexity Classes and Relationships: Reachability Method and Savitch's Theorem" author: "Your Name" date: "2025-01-28" format: html: toc: true # Table of Contents toc-depth: 2 code-tools: true theme: cosmo # Or "journal" for Distill-like minimalism --- # Introduction {#sec:introduction} In our previous discussion, we analyzed several relationships between complexity classes, and we successfully established the first five relationships. We now focus on proving the inclusion: $$NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n}).$$ To achieve this, we will employ the reachability method, a technique that leverages graph representations of Turing machine computations. This method is quite general and has applications in various verification techniques, particularly in the field of dynamic verification of systems. The core idea is to represent all possible executions of a system as a finite graph and then analyze the properties of this graph. This approach is fundamental in areas such as model checking and abstract interpretation. ::: tcolorbox - **Reachability Problem:** Determining if a path exists between two nodes in a graph. - **Configuration of a Turing Machine:** A snapshot of the Turing machine's state, tape contents, and head positions at a given point in time. - **Graph Representation of Computations:** Representing all possible computations of a Turing machine as a graph, where nodes are configurations and edges are transitions between configurations. ::: We will explore how these concepts help us relate non-deterministic space complexity to deterministic time and space complexity, and how they connect to the hierarchy of complexity classes such as L, NL, P, NP, PSPACE, NPSPACE, EXP, and NEXP. We will also touch upon the state space explosion problem in model checking and the distinction between static and dynamic verification techniques. Furthermore, we will discuss Savitch's theorem, which provides a deterministic space upper bound for the reachability problem, and its implications for the relationship between deterministic and non-deterministic space complexity. Finally, we will examine the open problem of whether L equals NL and the role of reachability in resolving this question. # Reachability Method {#sec:reachability_method} Our goal is to demonstrate that if a language $L$ belongs to the complexity class $NSPACE(f(n))$, where $f$ is a proper function, then $L$ also belongs to $\bigcup_{c>0} TIME(c^{f(n) + \log n})$. This implies the existence of a non-deterministic Turing machine $N$ that decides $L$ using space $f(n)$. When analyzing space complexity, we specifically consider input-output Turing machines. ::: tcolorbox An input-output Turing machine is a Turing machine with a designated read-only input tape, a designated write-only output tape, and one or more working tapes. The machine cannot modify the content of the input tape and it cannot reuse the content of the output tape. ::: Given an input string $X$, our objective is to determine whether $X$ is a member of the language $L$. The Turing machine $N$ begins its computation in an initial configuration determined by the input $X$. This initial configuration encompasses the contents of the input tape, the working tapes, and the output tape. Formally, the initial configuration can be represented as: $$(q_0, X\sqcup, \underbrace{\epsilon, \dots, \epsilon}_{k-2 \text{ times}}, \epsilon)$$ where $q_0$ is the initial state, $X\sqcup$ is the input string followed by a blank symbol, $\epsilon$ represents an empty working tape, and the last $\epsilon$ represents the empty output tape. Crucially, if $X$ is indeed in $L$, there exists at least one computation path of the non-deterministic machine $N$ that eventually reaches an accepting state, denoted as \"yes\". This accepting configuration can be represented as: $$(q_{\text{yes}}, U_1, V_1, U_2, V_2, \dots, U_{k-1}, V_{k-1}, W)$$ where $q_{\text{yes}}$ is the accepting state, $U_1$ and $V_1$ represent the split of the input tape content, $U_i$ and $V_i$ are the contents of the working tapes, and $W$ is the content of the output tape. Conversely, if $X$ is not in $L$, no computation path of $N$ will ever reach a configuration with the \"yes\" state. To analyze this, we consider a graph $G_{N,X}$ that is specific to both the Turing machine $N$ and the input $X$. This graph, $G_{N,X}$, represents all possible configurations of $N$ when processing the input $X$, and the transitions between these configurations. ::: tcolorbox The configuration graph $G_{N,X}$ of a Turing machine $N$ on input $X$ is a directed graph where: - Nodes are the possible configurations of $N$ on input $X$. - There is a directed edge from configuration $C_1$ to $C_2$ if $N$ can transition from $C_1$ to $C_2$ in one step. ::: The reachability problem on this graph is to determine whether a path exists from the initial configuration to any accepting configuration. ::: {.lemma:reachability_equiv .lemma} **Lemma 1**. *A string $X$ belongs to the language $L$ if and only if a configuration with state \"yes\" is reachable from the initial configuration in the graph $G_{N,X}$.* ::: ::: proof *Proof.* If $X \in L$, then there exists an accepting computation of $N$ on $X$. This computation corresponds to a path in $G_{N,X}$ from the initial configuration to an accepting configuration. Conversely, if an accepting configuration is reachable in $G_{N,X}$, then there exists an accepting computation of $N$ on $X$, implying $X \in L$. ◻ ::: The graph $G_{N,X}$ is finite because the Turing machine $N$ works in space $f(n)$, and it is acyclic because our machines are always terminating. The maximum out-degree of each node is bounded by a constant that depends on the machine $N$ but not on the input $X$. <figure id="fig:config_graph"> <figcaption>A simplified representation of the configuration graph <span class="math inline">$G_{N,X}$</span>. Nodes represent configurations, and directed edges represent transitions between configurations. Solid arrows represent transitions in one computation path, and dashed arrows represent transitions in another possible computation path. The double circle represents an accepting configuration.</figcaption> </figure> Figure [1](#fig:config_graph){reference-type="ref" reference="fig:config_graph"} shows a simplified representation of the configuration graph $G_{N,X}$. ## Simplification of Configurations To simplify the analysis, we can reduce the complexity of the configurations: - **Ignoring the Output Tape:** The content of the output tape is irrelevant for deciding whether $X \in L$, so we can ignore it. - **Input Tape Index:** Since the input tape content is fixed as $X\sqcup$, we can replace the content of the input tape ($U_1$ and $V_1$) with a single index $J$ that represents the split of $X\sqcup$ into $U_1$ and $V_1$, where $U_1V_1 = X\sqcup$. The index $J$ ranges from $0$ to $|X| + 1$. Thus, a simplified configuration can be represented as $(q, J, U_2, V_2, \dots, U_{k-1}, V_{k-1})$. ## Number of Possible Configurations To determine the size of the graph $G_{N,X}$, we need to estimate the number of possible configurations. - The state $q$ can take $|Q| + 2$ values, where $|Q|$ is the number of states of the machine $N$, plus two additional states \"yes\" and \"no\". - The index $J$ can take $|X| + 2$ values. - Each working tape string $U_i$ or $V_i$ has a length of at most $f(|X|)$, since the machine $N$ works in space $f(n)$. The number of possible strings of length at most $f(|X|)$ is $|\Sigma|^{f(|X|)}$, where $\Sigma$ is the tape alphabet. - There are $k-2$ working tapes. Therefore, the total number of possible configurations is bounded by: $$(|Q| + 2) \cdot (|X| + 2) \cdot (|\Sigma|^{f(|X|)})^{2(k-2)}$$ where $\Sigma$ is the tape alphabet. This can be simplified to: $$(|Q| + 2) \cdot (|X| + 2) \cdot |\Sigma|^{2(k-2)f(|X|)} \leq \alpha^{f(|X|) + \log |X|}$$ for some constant $\alpha$ that depends on the machine $N$. The constant $\alpha$ incorporates the constants $|Q|+2$, $2(k-2)$, and $|\Sigma|$. ::: tcolorbox The number of nodes in the configuration graph $G_{N,X}$ is bounded by $\alpha^{f(|X|) + \log |X|}$, where $\alpha$ is a constant depending on the machine $N$. ::: This bound on the number of configurations is crucial for establishing the time complexity of the deterministic machine that will decide the language $L$. # Simplifying the Graph {#sec:simplifying_graph} To facilitate our analysis, we aim to simplify the structure of the configuration graph $G_{N,X}$. Initially, the nodes of this graph represent configurations of the form: $$(q, U_1, V_1, U_2, V_2, \dots, U_{k-1}, V_{k-1}, W),$$ where: - $q$ denotes the current state of the Turing machine $N$. - $U_1$ and $V_1$ represent the content of the input tape, with $U_1$ being the portion of the tape to the left of the head and $V_1$ being the portion to the right, including the symbol under the head. - $U_i$ and $V_i$ represent the contents of the $i$-th working tape, with $U_i$ being the portion of the tape to the left of the head and $V_i$ being the portion to the right, including the symbol under the head. - $W$ represents the content of the output tape. We can simplify these configurations based on the following observations: 1. **Output Tape Irrelevance:** The content of the output tape is not relevant for our decision procedure. Since we are deciding a language, the output tape is only used to write the final result, and its content during the computation does not affect the decision. Therefore, we can disregard the output tape's content. This simplifies the configuration to: $$(q, U_1, V_1, U_2, V_2, \dots, U_{k-1}, V_{k-1}).$$ 2. **Input Tape Immutability:** In an input-output Turing machine, the input tape is read-only and cannot be modified. The content of the input tape remains constant throughout the computation. The input $X$ is split into $U_1$ and $V_1$, but their concatenation always forms the original input $X$ (with a possible blank at the end), i.e., $U_1V_1 = X\sqcup$. Instead of storing $U_1$ and $V_1$ explicitly, we can use an index $J$ to represent the position of the read head on the input tape. The index $J$ indicates how the input $X\sqcup$ is split into $U_1$ and $V_1$. The possible values for $J$ range from $0$ (when the head is at the beginning of $X$) to $|X|+1$ (when the head is at the blank symbol after $X$). By applying these simplifications, we can represent a configuration in a more compact form: $$(q, J, U_2, V_2, \dots, U_{k-1}, V_{k-1}).$$ This simplified representation retains all the necessary information to describe the state of the Turing machine during its computation, while reducing the complexity of the configuration representation. This simplification is crucial for analyzing the size of the configuration graph and, consequently, the time complexity of the deterministic machine that will decide the language $L$. <figure id="fig:config_simplification"> <figcaption>Simplification of a Turing Machine Configuration. The output tape is ignored, and the input tape content and head position are replaced by an index <span class="math inline">$J$</span>.</figcaption> </figure> Figure [2](#fig:config_simplification){reference-type="ref" reference="fig:config_simplification"} illustrates the simplification process of a Turing machine configuration. # Number of Possible Configurations {#sec:number_configurations} Now, we aim to determine the number of possible configurations, which corresponds to the number of nodes in the simplified graph $G_{N,X}$. This analysis is crucial for understanding the size of the graph and, consequently, the complexity of the reachability problem. Let's analyze the number of possible values for each component of a simplified configuration $(q, J, U_2, V_2, \dots, U_{k-1}, V_{k-1})$: - **State $q$:** The state $q$ can take on any of the states of the Turing machine $N$, plus two additional states: \"yes\" (accepting) and \"no\" (rejecting). Therefore, there are $|Q| + 2$ possible values for $q$, where $|Q|$ is the number of states in the machine $N$. - **Index $J$:** The index $J$ represents the position of the read head on the input tape. As discussed earlier, $J$ can range from $0$ to $|X| + 1$, where $|X|$ is the length of the input string $X$. Thus, there are $|X| + 2$ possible values for $J$. - **Tape Contents $U_i$ and $V_i$:** Each working tape has two parts, $U_i$ and $V_i$. The machine $N$ operates in space $f(n)$, where $n$ is the length of the input $X$. This means that the length of each $U_i$ and $V_i$ is at most $f(|X|)$. Since each tape symbol is drawn from the tape alphabet $\Sigma$, there are at most $|\Sigma|^{f(|X|)}$ possible choices for each $U_i$ and each $V_i$. - **Number of Working Tapes:** We have $k-2$ working tapes, each with two parts $U_i$ and $V_i$. Therefore, we have a total of $2(k-2)$ strings $U_i$ and $V_i$. Combining these factors, the total number of possible configurations is at most: $$(|Q| + 2) \cdot (|X| + 2) \cdot (|\Sigma|^{f(|X|)}) ^{2(k-2)} = (|Q| + 2) \cdot (|X| + 2) \cdot |\Sigma|^{2(k-2)f(|X|)}.$$ This expression can be simplified by noting that $|Q|$, $k$, and $|\Sigma|$ are constants that depend on the machine $N$ but not on the input $X$. We can rewrite the expression as: $$\alpha \cdot (|X| + 2) \cdot \beta^{f(|X|)},$$ where $\alpha = |Q| + 2$ and $\beta = |\Sigma|^{2(k-2)}$ are constants. Furthermore, we can express $|X|+2$ as $2^{\log(|X|+2)}$, and since $\log(|X|+2) \leq \log(2|X|) = \log 2 + \log |X| \leq 1 + \log |X|$, we can say that $|X|+2$ is less than or equal to $c \cdot 2^{\log |X|}$ for some constant $c$. Thus, the total number of configurations can be bounded by: $$\alpha \cdot c \cdot 2^{\log |X|} \cdot \beta^{f(|X|)} \leq \gamma \cdot 2^{\log |X| + f(|X|)},$$ for some constant $\gamma$. Finally, we can rewrite this as: $$\gamma \cdot 2^{f(|X|) + \log |X|}$$ for some constant $\gamma$. We can absorb the constant $\gamma$ into the base of the exponent, resulting in a bound of the form: $$\alpha^{f(|X|) + \log |X|},$$ for some constant $\alpha$. ::: tcolorbox The number of possible configurations for a Turing machine $N$ on input $X$ working in space $f(|X|)$ is bounded by $\alpha^{f(|X|) + \log |X|}$ for some constant $\alpha$ that depends on the machine $N$. ::: This result is crucial because it provides an upper bound on the size of the configuration graph $G_{N,X}$, which is essential for analyzing the time complexity of solving the reachability problem on this graph. # Deterministic Machine for Deciding L {#sec:deterministic_machine} We now construct a deterministic Turing machine $M$ that decides the language $L$. This machine will leverage the graph $G_{N,X}$ and the reachability method. The machine $M$ operates as follows: 1. **Input:** The machine $M$ receives the input string $X$. 2. **Graph Construction:** $M$ constructs the configuration graph $G_{N,X}$ on one of its working tapes. This graph represents all possible configurations of the non-deterministic machine $N$ on input $X$, along with the transitions between them. The construction of this graph is deterministic and can be done by simulating the transitions of $N$. The transition function of $N$ is encoded in the transition function of $M$. 3. **Reachability Check:** $M$ then solves the reachability problem on the graph $G_{N,X}$. Specifically, it checks whether there exists a path from the initial configuration of $N$ on input $X$ to any configuration where the state is \"yes\" (an accepting configuration). This can be done using a standard graph traversal algorithm, such as Breadth-First Search (BFS) or Depth-First Search (DFS). :::: algorithm ::: algorithmic Input string $X$ Construct the initial configuration $C_0$ of $N$ on input $X$ Initialize the graph $G_{N,X}$ with node $C_0$ and set of edges $E$ to $\emptyset$ $Q \gets \{C_0\}$ $C \gets$ Dequeue($Q$) Generate all possible next configurations $C'$ from $C$ using the transition function of $N$ Add $C'$ to $G_{N,X}$ Enqueue($Q$, $C'$) Add edge $(C, C')$ to $E$ Accept $X$ Reject $X$ ::: :::: Algorithm [\[alg:deterministic_machine\]](#alg:deterministic_machine){reference-type="ref" reference="alg:deterministic_machine"} provides a pseudocode description of the deterministic Turing machine $M$. The reachability problem on a graph can be solved in polynomial time with respect to the size of the graph. For instance, BFS has a time complexity that is linear in the number of nodes and edges of the graph in a pseudocode implementation. However, on a Turing machine, the time complexity might be higher, but still polynomial in the size of the graph. Let's assume the time complexity of solving reachability on a Turing machine is $O(n^\beta)$ where $n$ is the size of the graph and $\beta$ is some constant. The overall time complexity of the deterministic machine $M$ is determined by two main factors: 1. The time required to construct the graph $G_{N,X}$. This is linear in the size of the graph, which is $O(\alpha^{f(|X|) + \log |X|})$. 2. The time required to solve the reachability problem on $G_{N,X}$. This is polynomial in the size of the graph, i.e., $O((\alpha^{f(|X|) + \log |X|})^\beta)$. Therefore, the total time complexity of $M$ is: $$O(\alpha^{f(|X|) + \log |X|}) + O((\alpha^{f(|X|) + \log |X|})^\beta)$$ Since the second term dominates the first term, the overall time complexity is: $$O((\alpha^{f(|X|) + \log |X|})^\beta) = O(\alpha^{\beta(f(|X|) + \log |X|)})$$ We can rewrite this as: $$TIME(c^{f(n) + \log n})$$ for some constant $c = \alpha^\beta$. ::: tcolorbox This demonstrates that the language $L$, which is decided by the non-deterministic machine $N$ in space $f(n)$, can also be decided by a deterministic machine $M$ in time $c^{f(n) + \log n}$ for some constant $c$. This proves the inclusion: $$NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n}).$$ ::: Note that the constant $c$ depends on the machine $N$ (through $\alpha$ and $\beta$). Therefore, we need to take the union over all possible constants to account for different machines. # Discussion on the Statement {#sec:discussion_statement} The statement presented in the book, $NSPACE(f(n)) \subseteq TIME(c^{f(n) + \log n})$, while capturing the essence of the result, has a subtle imprecision. The constant $c$ in the time complexity bound is not universal; it depends on the specific non-deterministic Turing machine $N$ that decides the language $L$ in space $f(n)$. To elaborate, when we analyze the size of the configuration graph $G_{N,X}$, we encounter constants that are specific to the machine $N$, such as the number of states $|Q|$, the size of the tape alphabet $|\Sigma|$, and the number of working tapes $k$. These constants influence the value of $c$ in the time complexity bound. Specifically, the constant $\alpha$ in the bound $\alpha^{f(|X|) + \log |X|}$ on the number of configurations depends on these machine-specific constants. Therefore, when we consider different languages within the same space complexity class $NSPACE(f(n))$, each language might have a different associated constant $c$. This means that the inclusion is not for a fixed $c$, but rather for a $c$ that depends on the specific machine. ::: tcolorbox The constant $c$ in the time complexity $TIME(c^{f(n) + \log n})$ depends on the specific non-deterministic Turing machine $N$ used to decide the language in space $f(n)$. ::: A more accurate and precise statement of the inclusion would be: $$NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n}).$$ This statement indicates that for any language in $NSPACE(f(n))$, there exists a constant $c$ (dependent on the machine) such that the language is in $TIME(c^{f(n) + \log n})$. The union over all possible constants $c$ reflects the fact that the constant is not fixed across all languages in $NSPACE(f(n))$. This distinction is important because changing the base of the exponential function can significantly alter the asymptotic behavior of the time complexity. For example, $3^{f(n)}$ and $4^{f(n)}$ are not asymptotically equivalent. Therefore, the union over all possible constants $c$ is necessary to accurately represent the relationship between the complexity classes. ::: tcolorbox The correct inclusion is: $$NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n}).$$ This reflects the fact that the constant $c$ depends on the specific Turing machine used. ::: This corrected statement provides a more accurate representation of the relationship between non-deterministic space complexity and deterministic time complexity. # Model Checking and Verification Techniques {#sec:model_checking} The reachability method we have just discussed is closely related to techniques used in model checking and dynamic verification of systems. These techniques share a common underlying principle: they transform the execution of a system into a graph representation and then analyze the properties of this graph. This approach is fundamental in the field of formal verification, where the goal is to ensure that systems behave correctly according to their specifications. In model checking and dynamic verification, the graph represents all possible executions or behaviors of a system. Each node in the graph corresponds to a state of the system, and the edges represent transitions between states. This graph is often referred to as the state space of the system. The system can be a hardware circuit, a software program, or any other kind of computational system. ::: tcolorbox Dynamic verification techniques analyze a system by constructing a graph representing all possible executions and then verifying properties of this graph. ::: The analysis of this graph can involve various types of properties, such as: - **Reachability:** Determining whether a specific state or configuration is reachable from the initial state. This is the same problem we addressed in the proof of the inclusion $NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n})$. - **Safety Properties:** Verifying that the system never enters an undesirable or unsafe state. For example, ensuring that a variable never takes a negative value or that a critical section is never accessed by two processes simultaneously. - **Liveness Properties:** Ensuring that the system eventually reaches a desired state or performs a specific action. For example, ensuring that a request is eventually granted or that a process eventually terminates. - **Temporal Properties:** Analyzing the behavior of the system over time, often using temporal logics. These logics allow us to express properties that involve sequences of states and transitions, such as \"always,\" \"eventually,\" or \"until.\" The key idea is to represent the semantics of the system as a graph and then use graph algorithms and analysis techniques to verify properties of the system. This approach is particularly useful for analyzing complex systems where it is difficult to reason about their behavior directly. The reachability method we used to prove the complexity class inclusion is a specific instance of this general approach. We constructed the graph of all possible configurations of a Turing machine and then used reachability to determine whether the input was accepted by the machine. This demonstrates the broad applicability of the graph-based approach in computer science. ## State Space Explosion Problem A major challenge in model checking and dynamic verification is the **state space explosion problem**. As the complexity of the system increases, the size of the state space graph can grow exponentially. This can make it computationally infeasible to store and analyze the entire graph. Techniques such as symbolic model checking, abstraction, and on-the-fly verification are used to mitigate this problem. On-the-fly model checking is analogous to the way we handled the graph $G_{N,X}$ in the proof of Savitch's theorem: we did not generate the whole graph, but only the parts that were needed during the computation. ## Static vs. Dynamic Verification It is important to distinguish between dynamic and static verification techniques: - **Dynamic Verification:** Constructs the graph of all possible executions (semantics) and analyzes its properties. Model checking is a typical example of dynamic verification. - **Static Verification:** Analyzes the system based on its syntax without constructing the execution graph. Type checking is a typical example of static verification. Static verification techniques are often more efficient but may not be able to detect all types of errors, while dynamic verification techniques are more comprehensive but can be computationally expensive. <figure id="fig:verification_techniques"> <figcaption>Comparison of Dynamic and Static Verification Techniques.</figcaption> </figure> Figure [3](#fig:verification_techniques){reference-type="ref" reference="fig:verification_techniques"} illustrates the difference between dynamic and static verificationverification techniques. ## Abstract Interpretation Abstract interpretation is a technique that combinesaspects of both static and dynamic verification. It involves creating an abstract representation of the system's state space and analyzing this abstraction to verify properties of the system. This can help to reduce the complexity of the analysis while still providing useful information about the system's behavior. # State Space Explosion Problem {#sec:state_space_explosion} A significant challenge in model checking and dynamic verification techniques is the *state space explosion problem*. This problem arises because the graph representing all possible executions of a system can grow exponentially with the size or complexity of the system. The number of states in the graph can become so large that it becomes impossible to store and analyze the entire graph, even with modern computing resources. As we have seen, the number of configurations of a Turing machine can be very large, even for relatively simple machines and inputs. In the context of model checking, the state space of a system can become enormous, making it impossible to store the entire graph in memory. This is particularly true for systems with many concurrent processes or complex interactions, where the number of possible interleavings of actions can lead to an exponential increase in the number of states. The state space explosion problem limits the applicability of these techniques to relatively small or simplified systems. When the graph becomes too large, it becomes computationally infeasible to analyze it using standard graph algorithms. This is because the memory requirements and the time complexity of the algorithms become prohibitive. For example, a simple system with a few variables, each with a few possible values, can easily have millions or billions of possible states. To address the state space explosion problem, researchers have developed various techniques, including: - **Symbolic Model Checking:** Using symbolic representations of the state space, such as Binary Decision Diagrams (BDDs), to represent and manipulate large sets of states efficiently. Instead of explicitly enumerating each state, symbolic model checking represents sets of states using logical formulas, allowing for more compact representations. - **Abstraction Techniques:** Simplifying the system by abstracting away irrelevant details, reducing the size of the state space. Abstraction involves creating a simplified model of the system that preserves the properties of interest while reducing the number of states. - **Partial Order Reduction:** Exploiting the independence of concurrent actions to reduce the number of states that need to be explored. If two actions do not interfere with each other, their order of execution does not matter, and only one interleaving needs to be considered. - **On-the-Fly Model Checking:** Generating only the necessary parts of the state space during the verification process, rather than constructing the entire graph upfront. This approach avoids generating states that are not relevant to the property being verified. This is similar to how we handled the graph $G_{N,X}$ when proving Savitch's theorem. We did not generate the entire graph but only the parts that were needed on demand. ::: tcolorbox Techniques like symbolic model checking, abstraction, partial order reduction, and on-the-fly model checking are used to mitigate the state space explosion problem. ::: These techniques aim to mitigate the impact of the state space explosion problem and make model checking and dynamic verification applicable to larger and more complex systems. However, the state space explosion problem remains a fundamental challenge in the field, and researchers continue to develop new techniques to address it. <figure id="fig:state_space_explosion"> <figcaption>The State Space Explosion Problem and Mitigation Techniques.</figcaption> </figure> Figure [4](#fig:state_space_explosion){reference-type="ref" reference="fig:state_space_explosion"} illustrates the state space explosion problem and the use of mitigation techniques. # Static Verification Techniques {#sec:static_verification} In contrast to model checking and dynamic verification, which analyze the semantics of a system by constructing and exploring its state space graph, \*\*static verification techniques\*\* analyze the syntax of a program without executing it or building its full state space. These techniques operate directly on the source code or an intermediate representation of the program, such as an Abstract Syntax Tree (AST). Static verification aims to identify potential errors and verify properties of the program by examining its structure and code, rather than its runtime behavior. ::: tcolorbox Static verification techniques analyze the syntax of a program without execution, aiming to identify potential errors and verify program properties. ::: A prominent example of a static verification technique is *type checking*. Type checking involves verifying that the types of variables and expressions in a program are consistent with the rules of the programming language. For instance, a type checker would detect an error if an integer value were assigned to a boolean variable without an explicit type cast. Type checking ensures that operations are performed on compatible data types, preventing many common programming errors. Other examples of static verification techniques include: - **Dataflow Analysis:** Analyzing how data flows through a program to detect potential errors, such as uninitialized variables or unused values. Dataflow analysis tracks the flow of data through the program's control flow graph, identifying potential issues related to variable usage. - **Abstract Interpretation:** Approximating the behavior of a program to verify properties, such as the range of values that a variable can take. Abstract interpretation uses abstract domains to represent sets of concrete values, allowing for the analysis of program properties without explicitly executing the program. - **Formal Methods:** Using mathematical techniques to specify and verify program properties. This can involve creating formal specifications of program behavior and using theorem provers or other formal methods tools to verify that the program meets its specification. - **Static Analysis Tools:** Using automated tools to analyze the source code for potential bugs, security vulnerabilities, or coding style violations. These tools can identify issues such as buffer overflows, SQL injection vulnerabilities, and adherence to coding standards. Examples include tools like FindBugs, PMD, and Coverity. Static verification techniques offer several advantages over dynamic verification: - **Efficiency:** Static analysis is generally more efficient than dynamic analysis because it does not require executing the program. This allows for faster analysis and feedback during the development process. - **Early Bug Detection:** Static analysis can detect errors early in the development process, before the program is executed. This can save time and resources by identifying issues before they become more difficult to fix. - **Coverage:** Static analysis can analyze all possible execution paths of a program, whereas dynamic analysis is limited to the specific execution paths that are tested. This allows for a more comprehensive analysis of the program's behavior. However, static verification techniques also have limitations: - **Approximation:** Static analysis often relies on approximations of the program's behavior, which can lead to false positives (reporting errors that do not exist) or false negatives (missing errors that do exist). The approximations are necessary to make the analysis tractable but can result in inaccuracies. - **Limited Scope:** Some types of errors, such as division by zero or out-of-bounds array access, may not be detectable by static analysis alone. These errors often depend on runtime values and are difficult to predict without executing the program. However, combining static analysis with other techniques like symbolic execution can help detect such errors. - **Complexity:** Some static analysis techniques can be complex and difficult to implement. Developing accurate and efficient static analysis tools can be challenging. Static and dynamic verification techniques are complementary approaches to ensuring the correctness and reliability of software systems. Static analysis is often used to detect common errors early in the development process, while dynamic analysis is used to verify more complex properties and to test the system under realistic conditions. Combining both approaches can lead to more robust and reliable software. <figure id="fig:static_verification"> <figcaption>Static Verification Techniques and their application to source code.</figcaption> </figure> Figure [5](#fig:static_verification){reference-type="ref" reference="fig:static_verification"} illustrates the application of static verification techniques to source code. # Corollary and Hierarchy Theorems {#sec:corollary_hierarchy} Based on the relationships between complexity classes that we have established, we can construct a hierarchy of complexity classes, ordered by inclusion: $$L \subseteq NL \subseteq P \subseteq NP \subseteq PSPACE = NPSPACE \subseteq EXP \subseteq NEXP \subseteq \dots$$ Here: - $L$ represents deterministic logarithmic space. - $NL$ represents non-deterministic logarithmic space. - $P$ represents deterministic polynomial time. - $NP$ represents non-deterministic polynomial time. - $PSPACE$ represents deterministic polynomial space. - $NPSPACE$ represents non-deterministic polynomial space. - $EXP$ represents deterministic exponential time, i.e., $\bigcup_{k \in \mathbb{N}} TIME(2^{n^k})$. - $NEXP$ represents non-deterministic exponential time. This chain of inclusions shows how different complexity classes relate to each other in terms of their computational power. The inclusions are derived from the relationships we have proven so far, such as the inclusion of $NL$ in $P$ using the reachability method and the inclusion of $NSPACE(f(n))$ in $\bigcup_{c>0}TIME(c^{f(n) + \log n})$. Savitch's theorem will further demonstrate that $NPSPACE = PSPACE$. ::: tcolorbox The hierarchy of complexity classes is: $$L \subseteq NL \subseteq P \subseteq NP \subseteq PSPACE = NPSPACE \subseteq EXP \subseteq NEXP \subseteq \dots$$ ::: Furthermore, we can invoke the hierarchy theorems to establish strict separations between some of these classes: - **Time Hierarchy Theorem:** The time hierarchy theorem states that for any time-constructible function $t(n)$, there exists a language that can be decided in time $t(n)$ but not in time $o(\frac{t(n)}{\log t(n)})$. As a consequence, we know that $P \subsetneq EXP$. This means that there are problems that can be solved in exponential time but cannot be solved in polynomial time. The strict inclusion $P \subsetneq EXP$ implies that at least one of the inclusions in the hierarchy is strict. - **Space Hierarchy Theorem:** Similarly, the space hierarchy theorem states that for any space-constructible function $s(n)$, there exists a language that can be decided in space $s(n)$ but not in space $o(s(n))$. As a consequence, we know that $L \subsetneq PSPACE$. This means that there are problems that can be solved in polynomial space but cannot be solved in logarithmic space. The strict inclusion $L \subsetneq PSPACE$ implies that at least one of the inclusions in the hierarchy is strict. These hierarchy theorems demonstrate that the inclusions in the hierarchy of complexity classes are not all equalities. At least some of the inclusions are strict, meaning that there are languages in the larger class that are not in the smaller class. For example, we know that $P$ is strictly contained in $EXP$, and $L$ is strictly contained in $PSPACE$. ::: tcolorbox - **Time Hierarchy Theorem:** $P \subsetneq EXP$ - **Space Hierarchy Theorem:** $L \subsetneq PSPACE$ ::: However, many other questions about the relationships between these classes remain open. For example, it is not known whether $P = NP$ or whether $L = NL$. These are some of the most important open problems in theoretical computer science. The question of whether $P=NP$ is particularly famous, and its resolution would have profound implications for many areas of computer science and mathematics. Similarly, the question of whether $L=NL$ is a fundamental open problem in complexity theory. If $P$ were equal to $NP$, then at least one of the inclusions between $P$ and $EXP$ would have to be strict, and at least one of the inclusions between $L$ and $PSPACE$ would have to be strict. <figure id="fig:complexity_hierarchy"> <figcaption>Hierarchy of Complexity Classes. An arrow from class A to class B indicates that A is a subset of B. A double line indicates that the two classes are equal.</figcaption> </figure> Figure [6](#fig:complexity_hierarchy){reference-type="ref" reference="fig:complexity_hierarchy"} illustrates the hierarchy of complexity classes. # Savitch's Theorem {#sec:savitch_theorem} ::: tcolorbox **Savitch's theorem** is a fundamental result in complexity theory that provides an upper bound on the space complexity of solving the reachability problem in directed graphs. Specifically, Savitch's theorem states that reachability can be solved in deterministic space $O(\log^2 n)$, where $n$ is the number of nodes in the graph. ::: To understand Savitch's theorem, we first define a predicate that will be used in the proof: ::: tcolorbox []{#def:path_predicate label="def:path_predicate"} Let $G = (V, E)$ be a directed graph. We define the predicate $Path(X, Y, I)$ to be true if and only if there exists a path from node $X$ to node $Y$ in $G$ with a length of at most $2^I$. ::: Using this predicate, we can express the reachability problem as follows: Given two nodes $U$ and $V$ in the graph $G$, $U$ reaches $V$ if and only if $Path(U, V, \log |V|)$ is true. This is because if there is a path from $U$ to $V$, then there is a path of length at most $|V|$ (since a simple path cannot visit any node more than once), and since $2^{\log |V|} = |V|$, the predicate $Path(U, V, \log |V|)$ will be true. ::: {.lemma:reachability_path .lemma} **Lemma 2**. *$U$ reaches $V$ in $G$ if and only if $Path(U, V, \log |V|)$ is true.* ::: ::: proof *Proof.* If there is a path from $U$ to $V$ in $G$, then there is a path of length at most $|V|$. Since $2^{\log |V|} = |V|$, the predicate $Path(U, V, \log |V|)$ is true. Conversely, if $Path(U, V, \log |V|)$ is true, then there is a path from $U$ to $V$ of length at most $2^{\log |V|} = |V|$, implying that $U$ reaches $V$. ◻ ::: The key idea behind Savitch's theorem is to define a recursive relationship for the $Path$ predicate: ::: {.lemma:path_recursive .lemma} **Lemma 3**. *$Path(X, Y, I)$ is true if and only if there exists a node $Z$ such that $Path(X, Z, I-1)$ and $Path(Z, Y, I-1)$ are true.* ::: ::: proof *Proof.* If $Path(X, Y, I)$ is true, then there is a path from $X$ to $Y$ of length at most $2^I$. We can split this path into two subpaths of length at most $2^{I-1}$. Let $Z$ be a node on the path such that the path from $X$ to $Z$ and the path from $Z$ to $Y$ have length at most $2^{I-1}$. Then $Path(X, Z, I-1)$ and $Path(Z, Y, I-1)$ are true. Conversely, if there exists a node $Z$ such that $Path(X, Z, I-1)$ and $Path(Z, Y, I-1)$ are true, then there is a path from $X$ to $Z$ of length at most $2^{I-1}$ and a path from $Z$ to $Y$ of length at most $2^{I-1}$. Concatenating these paths gives a path from $X$ to $Y$ of length at most $2^{I-1} + 2^{I-1} = 2^I$, so $Path(X, Y, I)$ is true. ◻ ::: This recursive definition allows us to break down the problem of finding a path of length at most $2^I$ into two subproblems of finding paths of length at most $2^{I-1}$. A deterministic Turing machine can decide the $Path(X, Y, I)$ predicate using a recursive algorithm that implements this definition. The space used by the Turing machine is determined by the depth of the recursion and the amount of information that needs to be stored at each level of the recursion. The depth of the recursion is at most $\log |V|$, since $I$ starts at $\log |V|$ and decreases by 1 at each recursive step until it reaches 0. At each level, we need to store the nodes $X$, $Y$, and $Z$, as well as the index $I$. The space required to store each of these is $O(\log |V|)$ (since we need to store node indices and the index $I$, which are all bounded by $|V|$). Therefore, the total space used by the Turing machine is $O(\log |V| \cdot \log |V|) = O(\log^2 |V|)$. :::: algorithm ::: algorithmic **true** **false** **true** **false** ::: :::: Algorithm [\[alg:path_recursive\]](#alg:path_recursive){reference-type="ref" reference="alg:path_recursive"} provides a pseudocode description of the recursive algorithm for the $Path(X, Y, I)$ predicate. ::: {.thm:savitch .theorem} **Theorem 4** (Savitch's Theorem). *The reachability problem can be solved in deterministic space $O(\log^2 n)$, where $n$ is the number of nodes in the graph.* ::: ::: proof *Proof.* The algorithm for deciding reachability is to compute $Path(U, V, \log |V|)$ using the recursive algorithm described above (Algorithm [\[alg:path_recursive\]](#alg:path_recursive){reference-type="ref" reference="alg:path_recursive"}). The space complexity of this algorithm is $O(\log^2 |V|)$, as explained in the text. The algorithm needs to store the current recursion depth, which is at most $\log |V|$, and for each level of recursion, it needs to store the indices of at most three nodes, $X$, $Y$, and $Z$, each requiring $O(\log |V|)$ space. Thus, the total space used is $O(\log |V| \cdot \log |V|) = O(\log^2 |V|)$. ◻ ::: This demonstrates that the reachability problem can be solved in deterministic space $O(\log^2 n)$, where $n$ is the number of nodes in the graph. ::: tcolorbox The reachability problem can be solved in deterministic space $O(\log^2 n)$ according to Savitch's Theorem. ::: It is important to note that while this algorithm is space-efficient, it is not time-efficient. The recursive calls can lead to redundant computations, making the algorithm inefficient in terms of time complexity. The time complexity of Savitch's algorithm is $n^{O(\log n)}$, which is quasi-polynomial, not polynomial. # Corollary of Savitch's Theorem {#sec:corollary_savitch} ::: tcolorbox A significant consequence of Savitch's theorem is the following inclusion: $$NSPACE(f(n)) \subseteq SPACE(f(n)^2)$$ This inclusion holds for any proper function $f(n)$ such that $f(n) \geq \log n$. This result demonstrates that non-deterministic space complexity is not significantly more powerful than deterministic space complexity, as the non-deterministic space can be simulated by a deterministic machine with only a quadratic increase in space usage. ::: To prove this corollary, we consider a language $L$ that is in $NSPACE(f(n))$. This means that there exists a non-deterministic Turing machine $N$ that decides $L$ using space $f(n)$. We can construct the configuration graph $G_{N,X}$ for $N$ on input $X$, as we did in the proof of the inclusion $NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n})$. The size of this graph, as we have shown, is bounded by $\alpha^{f(|X|) + \log |X|}$ for some constant $\alpha$. To decide whether $X$ is in $L$, we need to solve the reachability problem on this graph. According to Savitch's theorem, reachability can be solved in deterministic space that is the square of the logarithm of the number of nodes in the graph. The logarithm of the size of the graph is: $$\log(\alpha^{f(|X|) + \log |X|}) = (f(|X|) + \log |X|) \log \alpha = O(f(|X|) + \log |X|).$$ Since $f(n) \geq \log n$, we have $f(|X|) + \log |X| = O(f(|X|))$. Therefore, the space required to solve reachability on this graph is: $$O((f(|X|) + \log |X|)^2) = O(f(|X|)^2).$$ This means that we can decide the language $L$ using a deterministic Turing machine that uses space $O(f(n)^2)$. However, there is a crucial detail to consider. In our previous discussion, we mentioned that the graph $G_{N,X}$ is not explicitly constructed and stored on a tape. Instead, the graph is generated on-the-fly, meaning that only the necessary parts of the graph are generated when needed during the reachability computation. This is important because if we were to store the entire graph, the space complexity would be much higher. The space used to store the graph could be exponential in $f(n)$, but by generating it on-the-fly, we avoid this issue. ::: tcolorbox The configuration graph $G_{N,X}$ is not explicitly constructed and stored on a tape. Instead, it is generated on-the-fly during the reachability computation. This means that only the parts of the graph that are needed for the computation are generated and stored at any given time. ::: By generating the graph on-the-fly, we can ensure that the space used by the deterministic machine is indeed $O(f(n)^2)$. This proves the inclusion: $$NSPACE(f(n)) \subseteq SPACE(f(n)^2).$$ This result highlights that non-determinism in space complexity does not provide a significant advantage over determinism, as the non-deterministic computation can be simulated deterministically with only a quadratic increase in space. ::: {.thm:savitch_corollary .theorem} **Theorem 5**. *$NSPACE(f(n)) \subseteq SPACE(f(n)^2)$ for any proper function $f(n) \geq \log n$.* ::: ::: proof *Proof.* Let $L \in NSPACE(f(n))$. Then there is a non-deterministic Turing machine $N$ that decides $L$ in space $f(n)$. We can construct the configuration graph $G_{N,X}$ of $N$ on input $X$. The size of $G_{N,X}$ is at most $\alpha^{f(|X|) + \log |X|}$ for some constant $\alpha$. Using Savitch's theorem, we can solve the reachability problem on $G_{N,X}$ in space $O(\log^2 |G_{N,X}|) = O((f(|X|) + \log |X|)^2) = O(f^2(|X|))$ since $f(n) \geq \log n$. The graph is generated on-the-fly, so the space used is dominated by the space needed for the reachability algorithm, which is $O(\log^2 |G_{N,X}|)$. ◻ ::: This result is a direct consequence of Savitch's theorem and the on-the-fly generation of the configuration graph. ::: corollary **Corollary 6**. *$PSPACE = NPSPACE$* ::: ::: proof *Proof.* We know that $PSPACE \subseteq NPSPACE$ by definition. To show that $NPSPACE \subseteq PSPACE$, we use the fact that $NPSPACE = \bigcup_{k} NSPACE(n^k)$. By Theorem [5](#thm:savitch_corollary){reference-type="ref" reference="thm:savitch_corollary"}, $NSPACE(n^k) \subseteq SPACE(n^{2k})$. Since $n^{2k}$ is also a polynomial, we have $\bigcup_{k} NSPACE(n^k) \subseteq \bigcup_{k} SPACE(n^{2k}) \subseteq PSPACE$. Therefore, $NPSPACE \subseteq PSPACE$, and thus $PSPACE = NPSPACE$. ◻ ::: # Open Problem: L vs NL {#sec:open_problem_l_nl} We have established that the complexity classes $L$ (deterministic logarithmic space) and $NL$ (non-deterministic logarithmic space) are related by the following inclusions: $$L \subseteq NL \subseteq \text{SPACE}(\log^2 n)$$ The inclusion $L \subseteq NL$ is straightforward, as any deterministic computation can be viewed as a special case of a non-deterministic computation. The inclusion $NL \subseteq \text{SPACE}(\log^2 n)$ follows directly from Savitch's theorem, which states that reachability can be solved in deterministic space $O(\log^2 n)$. ::: tcolorbox We have: $$L \subseteq NL \subseteq \text{SPACE}(\log^2 n)$$ ::: However, the question of whether $L$ is equal to $NL$ remains one of the most important open problems in complexity theory. Formally, this is expressed as: $$L \stackrel{?}{=} NL$$ This question asks whether the power of non-determinism provides a significant advantage in the context of logarithmic space complexity. In other words, can every problem that can be solved by a non-deterministic Turing machine in logarithmic space also be solved by a deterministic Turing machine in logarithmic space? A crucial connection between the $L$ vs $NL$ problem and the reachability problem is the equivalence: $$L = NL \iff \text{there exists a deterministic log-space algorithm for reachability.}$$ This equivalence implies that finding such an algorithm would prove $L = NL$. Conversely, proving that no deterministic log-space algorithm exists would show $L \neq NL$. ::: tcolorbox The question of whether $L=NL$ is equivalent to the existence of a deterministic log-space algorithm for reachability. ::: The reachability problem is, therefore, a central problem for understanding the relationship between deterministic and non-deterministic logarithmic space. The fact that we do not yet know whether $L = NL$ highlights the limitations of our current understanding of the power of non-determinism in space complexity. ::: tcolorbox The $L$ vs $NL$ problem is a fundamental open question in complexity theory, and its resolution would have significant implications for our understanding of the limits of computation. ::: This open problem, though less famous than the $P$ vs $NP$ problem, is equally significant in the field of complexity theory. Resolving it would profoundly impact our understanding of the fundamental limits of computation. ## Reachability in Undirected Graphs An interesting fact related to this open problem is that reachability in *undirected graphs* can be solved in deterministic logarithmic space. However, the same is not known for directed graphs. This difference between undirected and directed graphs is a unique case in complexity theory, where the same problem has different known complexities depending on whether the graph is directed or undirected. This fact further emphasizes the complexity of the $L$ vs $NL$ problem. ::: tcolorbox Reachability in undirected graphs can be solved in deterministic logarithmic space, while the same is not known for directed graphs. ::: This difference highlights the subtle complexities involved in understanding the relationship between deterministic and non-deterministic logarithmic space. It is known that if one could provide an algorithm to solve directed graph reachability in deterministic logarithmic space, then $L=NL$. On the other hand, if one could prove that no such algorithm exists, then $L \neq NL$. # Reachability on Undirected Graphs {#sec:reachability_undirected} An interesting fact related to the $L$ vs $NL$ problem is that the reachability problem on *undirected graphs* can be solved in deterministic logarithmic space. This is in contrast to the general reachability problem on directed graphs, for which the best known deterministic algorithm requires $O(\log^2 n)$ space (Savitch's theorem). ::: tcolorbox Reachability in undirected graphs is in $L$, while the best known deterministic algorithm for directed graphs requires $O(\log^2 n)$ space. ::: The fact that reachability on undirected graphs is in $L$ is a non-trivial result and highlights a difference in the complexity of reachability between directed and undirected graphs. This result is somewhat surprising, as it might seem that the direction of edges should not make a significant difference in the complexity of the problem. However, the absence of direction in undirected graphs allows for more efficient algorithms to be used. The algorithm for solving reachability on undirected graphs in deterministic log-space is not straightforward and involves a clever use of graph traversal techniques. The details of this algorithm, known as Reingold's algorithm, are beyond the scope of this discussion, but it is important to note that such an algorithm exists. The algorithm typically involves exploring the graph using a depth-first search strategy, while maintaining only a logarithmic amount of information on thestack. It leverages the property that in an undirected graph, if there is a path from $u$ to $v$, then there is also a path from $v$ to $u$. ::: tcolorbox The difference in complexity between reachability in directed and undirected graphs is a unique case in complexity theory. ::: This difference underscores the complexity of the $L$ vs $NL$ problem and highlights the subtle ways in which the structure of the input can affect the complexity of a problem. <figure id="fig:reachability_graphs"> <figcaption>Comparison of Reachability: Directed vs. Undirected Graphs.</figcaption> </figure> Figure [7](#fig:reachability_graphs){reference-type="ref" reference="fig:reachability_graphs"} illustrates the difference in complexity for reachability in directed and undirected graphs. # Non-determinism in Space Complexity {#sec:nondeterminism_space} Based on the results we have discussed so far, it appears that non-determinism in space complexity does not provide a significant advantage over determinism, especially when considering higher complexity classes. This is in contrast to time complexity, where non-determinism can potentially lead to exponential speedups (as exemplified by the P vs NP problem). Specifically, we have shown that: - $NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n})$: A non-deterministic computation using space $f(n)$ can be simulated by a deterministic computation in time that is exponential in $f(n)$. - $NSPACE(f(n)) \subseteq SPACE(f(n)^2)$: A non-deterministic computation using space $f(n)$ can be simulated by a deterministic computation using space that is quadratic in $f(n)$. These results indicate that the power of non-determinism in space complexity is limited. While non-determinism can sometimes lead to more concise algorithms, it does not seem to fundamentally alter the space complexity of problems, at least not beyond a quadratic factor. ::: tcolorbox Non-determinism in space complexity does not provide a significant advantage over determinism, especially for higher complexity classes. ::: In the case of time complexity, non-determinism can potentially lead to exponential speedups, as evidenced by the $P$ vs $NP$ problem. However, in the case of space complexity, the difference between deterministic and non-deterministic space is much smaller. This is because space can be reused, while time cannot. The only known case where non-determinism in space complexity might make a difference is in the lower complexity classes, such as $L$ and $NL$. We do not know whether $L = NL$, and this is one of the major open problems in complexity theory. However, even in this case, the difference is limited to a logarithmic factor, as $NL \subseteq \text{SPACE}(\log^2 n)$. ::: tcolorbox Non-determinism in space complexity might make a difference in lower complexity classes like $L$ and $NL$, but the difference is at most a quadratic factor in general, and unknown for $L$ vs $NL$. ::: When we move to higher complexity classes, such as $PSPACE$ and $NPSPACE$, we have the important result that $PSPACE = NPSPACE$. This means that non-determinism does not provide any additional power in the context of polynomial space complexity. This equality is a consequence of Savitch's theorem, as $NPSPACE = \bigcup_k NSPACE(n^k) \subseteq \bigcup_k SPACE((n^k)^2) = \bigcup_k SPACE(n^{2k}) = PSPACE$. The reverse inclusion $PSPACE \subseteq NPSPACE$ is trivial. ::: tcolorbox $PSPACE = NPSPACE$, meaning that non-determinism does not provide any additional power in the context of polynomial space complexity. ::: Therefore, it seems that non-determinism in space complexity is not as powerful as non-determinism in time complexity. While non-determinism can be a useful tool for designing algorithms, it does not seem to fundamentally alter the space complexity of problems, at least not beyond a quadratic factor. The quadratic increase in space usage is a direct consequence of Savitch's theorem. Furthermore, next time we will see that $NSPACE(f(n)) = coNSPACE(f(n))$, meaning that the complements of non-deterministic space complexity classes are equal to the classes themselves. This is another indication that non-determinism in space is not very powerful. # Complementary Classes {#sec:complementary_classes} We now turn our attention to complementary complexity classes. Given a complexity class $C$, the complementary class $coC$ is defined as the set of all languages whose complements are in $C$. Formally: $$coC = \{ L \mid \overline{L} \in C \}$$ where $\overline{L}$ denotes the complement of the language $L$. The complement of a language $L$ is the set of all strings that are not in $L$. ::: tcolorbox The complementary class $coC$ is defined as the set of all languages whose complements are in $C$. $$coC = \{ L \mid \overline{L} \in C \}$$ ::: Understanding the relationship between a complexity class and its complement can provide valuable insights into the nature of the problems within that class. For deterministic complexity classes, the relationship is straightforward. For example, if a deterministic Turing machine can decide a language $L$ within a certain time or space bound, then by simply flipping the accepting and rejecting states, we can construct a machine that decides the complement $\overline{L}$ within the same bounds. Therefore, for deterministic complexity classes such as $L$, $P$, $PSPACE$, and $EXP$, we have: $$coL = L$$ $$coP = P$$ $$coPSPACE = PSPACE$$ $$coEXP = EXP$$ This means that if a problem can be solved efficiently by a deterministic machine, then its complement can also be solved efficiently by a deterministic machine within the same resource bounds. ::: tcolorbox Deterministic complexity classes are closed under complementation, i.e., $coC = C$ for deterministic classes $C$. ::: However, the situation is more complex for non-deterministic complexity classes. Consider the class $NP$. A language $L$ is in $NP$ if there exists a non-deterministic Turing machine that accepts $L$ in polynomial time. If we simply flip the accepting and rejecting states of such a machine, the resulting machine might not correctly decide the complement $\overline{L}$. This is because the non-deterministic machine accepts if *at least one* computation path leads to an accepting state. Flipping the states means that the new machine will accept if *none* of the original computation paths accepted, which doesn't directly correspond to a polynomial-time acceptance of $\overline{L}$ by a non-deterministic machine. The class of languages whose complements are in $NP$ is denoted by $coNP$. A language $L$ is in $coNP$ if and only if its complement $\overline{L}$ is in $NP$. This means that there exists a non-deterministic Turing machine that accepts $\overline{L}$ in polynomial time. One of the major open problems in complexity theory is the relationship between $NP$ and $coNP$: $$NP \stackrel{?}{=} coNP$$ If $NP = coNP$, it would imply a certain symmetry in the difficulty of proving the existence and non-existence of solutions. Most researchers believe that $NP \neq coNP$, suggesting that proving that a solution does not exist is inherently harder than proving that a solution exists for problems in $NP \setminus coNP$. ::: tcolorbox The relationship between $NP$ and $coNP$ is a major open problem in complexity theory: $$NP \stackrel{?}{=} coNP$$ ::: Similarly, we can define $coNL$ as the class of languages whose complements are in $NL$. A language $L$ is in $coNL$ if and only if its complement $\overline{L}$ is in $NL$. A significant result in this context is that: $$NL = coNL$$ This result, known as the Immerman--Szelepcsényi theorem, is a non-trivial finding and demonstrates that non-deterministic logarithmic space is closed under complementation. The proof of this theorem is intricate and relies on a clever inductive argument to count the number of reachable states in a graph. This theorem provides a rare instance where a non-deterministic complexity class is known to be closed under complementation. ::: tcolorbox The Immerman-Szelepcsényi theorem states that $NL = coNL$, demonstrating that non-deterministic logarithmic space is closed under complementation. ::: As we saw before, $PSPACE = NPSPACE$. Since $PSPACE$ is a deterministic class, it follows that $PSPACE = coPSPACE$. Therefore, $NPSPACE = coNPSPACE$. In general, for a proper function $f(n) \geq \log n$, we have: $$NSPACE(f(n)) = coNSPACE(f(n))$$ We will prove this result in the next lecture. In summary, while deterministic complexity classes are trivially closed under complementation, the situation for non-deterministic classes is more nuanced and leads to interesting open problems and non-trivial results. The relationship between a complexity class and its complement offers a different perspective on the inherent difficulty of computational problems. # The Landscape of Complexity Classes {#sec:complexity_landscape} Putting together the relationships and theorems we have discussed, we can paint a more complete picture of the landscape of complexity classes and the connections between them. We have established the following inclusions: $$\begin{aligned} L &\subseteq NL \subseteq P \subseteq NP \subseteq PSPACE = NPSPACE \\ PSPACE &\subseteq EXP \subseteq NEXP \end{aligned}$$ We also have the following equalities for complementary classes: $$\begin{aligned} L &= coL \\ P &= coP \\ PSPACE &= coPSPACE = NPSPACE = coNPSPACE \\ NL &= coNL \end{aligned}$$ And we suspect, but haven't proven, that: $$\begin{aligned} P &\neq NP \\ L &\neq NL \\ NP &\neq coNP \end{aligned}$$ ::: tcolorbox We have the following inclusions: $$L \subseteq NL \subseteq P \subseteq NP \subseteq PSPACE = NPSPACE \subseteq EXP \subseteq NEXP$$ And the following equalities for complementary classes: $$\begin{aligned} L &= coL \\ P &= coP \\ PSPACE &= coPSPACE = NPSPACE = coNPSPACE \\ NL &= coNL \end{aligned}$$ ::: These inclusions and suspected separations define the basic structure of the complexity class hierarchy. The classes at the bottom, like $L$, represent problems solvable with very limited resources, while classes at the top, like $NEXP$, represent problems that require significant computational resources. The equalities $PSPACE = NPSPACE$ and $NL = coNL$ are particularly interesting as they show that non-determinism does not add any power in the context of polynomial space and that non-deterministic logarithmic space is closed under complementation. The open problems, particularly $P$ vs $NP$ and $L$ vs $NL$, are central to our understanding of the limits of computation. Resolving these questions would have profound implications for computer science and mathematics. For instance, if $P = NP$, many currently intractable problems would become efficiently solvable. Conversely, if $P \neq NP$, it would solidify the belief that certain computational tasks are inherently hard. Beyond the classes mentioned above, there are other important complexity classes, such as: - **EXP (Exponential Time):** Problems solvable by a deterministic Turing machine in time $2^{p(n)}$ for some polynomial $p(n)$. - **NEXP (Non-deterministic Exponential Time):** Problems solvable by a non-deterministic Turing machine in time $2^{p(n)}$ for some polynomial $p(n)$. - **EXPSPACE (Exponential Space):** Problems solvable by a deterministic Turing machine in space $2^{p(n)}$ for some polynomial $p(n)$. ::: tcolorbox - **EXP:** Problems solvable in deterministic exponential time. - **NEXP:** Problems solvable in non-deterministic exponential time. - **EXPSPACE:** Problems solvable in deterministic exponential space. ::: The hierarchy theorems tell us that: $$P \subsetneq EXP$$ $$PSPACE \subsetneq EXPSPACE$$ And it is known that: $$P \subseteq NP \subseteq NEXP$$ $$L \subseteq NL \subseteq P \subseteq PSPACE \subseteq EXP \subseteq NEXP \subseteq EXPSPACE$$ A major open problem related to these classes is: $$NP \stackrel{?}{=} NEXP$$ It is widely believed that $NP \neq NEXP$, which would imply that adding exponential time to a non-deterministic machine gives it more computational power. ::: tcolorbox The relationship between $NP$ and $NEXP$ is another major open problem: $$NP \stackrel{?}{=} NEXP$$ ::: The relationships between these complexity classes form a rich and intricate structure. Understanding this structure is crucial for classifying the difficulty of computational problems and for guiding the development of efficient algorithms. While significant progress has been made in complexity theory, many fundamental questions remain unanswered, driving ongoing research in the field. <figure id="fig:complexity_landscape"> <figcaption>The Landscape of Complexity Classes.</figcaption> </figure> Figure [8](#fig:complexity_landscape){reference-type="ref" reference="fig:complexity_landscape"} provides a visual representation of the landscape of complexity classes. # Conclusion {#sec:conclusion} In this discussion, we have explored several key relationships between complexity classes, focusing on the inclusion $NSPACE(f(n)) \subseteq \bigcup_{c>0} TIME(c^{f(n) + \log n})$ and its implications. We delved into the reachability method, a powerful technique for proving such inclusions, and discussed its connections to model checking and verification techniques. We also highlighted the challenges posed by the state space explosion problem and contrasted dynamic verification with static verification techniques. ::: tcolorbox - Reachability Method - Configuration Graphs - Model Checking and Verification - State Space Explosion Problem - Static vs. Dynamic Verification ::: We examined fundamental theorems like Savitch's theorem and its corollary, which shed light on the relationship between deterministic and non-deterministic space complexity. The open problem of $L$ vs $NL$ was discussed, emphasizing its importance in understanding the power of non-determinism in logarithmic space. We also touched upon the concept of complementary complexity classes, highlighting the significance of the Immerman--Szelepcsényi theorem ($NL = coNL$). ::: tcolorbox - Savitch's Theorem: Reachability in $O(\log^2 n)$ space. - $NSPACE(f(n)) \subseteq SPACE(f(n)^2)$ - $NL = coNL$ (Immerman-Szelepcsényi Theorem) - $PSPACE = NPSPACE$ ::: The landscape of complexity classes presents a fascinating and challenging area of study. While we have mapped out many of the relationships between these classes, several fundamental questions remain open, driving ongoing research in theoretical computer science. The quest to understand the power and limitations of computation continues, with each new insight building upon the foundational work we have explored. The journey through complexity theory reveals the deep and often surprising connections between seemingly disparate computational models and problems, underscoring the elegance and complexity of the computational universe. We also learned that non-determinism in space is less powerful than non-determinism in time and that $NSPACE(f(n)) = coNSPACE(f(n))$ for $f(n) \geq \log n$. ::: tcolorbox - $P \stackrel{?}{=} NP$ - $L \stackrel{?}{=} NL$ - $NP \stackrel{?}{=} coNP$ - $NP \stackrel{?}{=} NEXP$ ::: These open problems represent some of the most important challenges in theoretical computer science, and their resolution would have a profound impact on our understanding of computation. In the next lecture, we will continue our exploration of complexity classes by proving the result $NSPACE(f(n)) = coNSPACE(f(n))$ for $f(n) \geq \log n$. This will involve developing a non-deterministic space-efficient algorithm for counting the number of nodes reachable from a given node in a graph.