Random Walks, SAT, and Markov Chains
Introduction
This lecture explores the fascinating connections between three seemingly disparate areas: random walks, Boolean satisfiability (SAT), and Markov chains. We will begin by investigating random walks, focusing on their recurrence properties on grid structures, culminating in the presentation of Pólya’s Recurrence Theorem. This theorem provides surprising insights into the long-term behavior of random walks in different dimensions, revealing that random walks on 1D and 2D grids are recurrent, meaning they return to their starting point infinitely often, while those on 3D grids and higher are transient.
Next, we will transition into the realm of propositional calculus and the Boolean satisfiability problem (SAT). We will define propositional variables, logical connectives, and formulas, and formally introduce the SAT problem, which asks if there exists a truth assignment that makes a given Boolean formula true. We will then discuss Conjunctive Normal Form (CNF) and its importance in SAT, along with the complexity classes of SAT and its variants, such as 2-SAT and 3-SAT. We will highlight that while SAT and 3-SAT are NP-complete, 2-SAT is solvable in polynomial time, making it a crucial tractable variant.
Following this, we will delve into algorithms specifically designed for solving 2-SAT. We will explore both deterministic polynomial-time algorithms, based on techniques like the resolution rule and implication graphs, and randomized algorithms, highlighting their effectiveness and underlying probabilistic principles.
To provide a theoretical foundation for analyzing randomized algorithms, we will introduce the concepts of expected values and Markov’s Inequality. These tools are essential for understanding the performance and limitations of randomized approaches in computer science.
Finally, we will formally introduce Markov chains as a powerful mathematical framework for modeling stochastic processes, including random walks and randomized algorithms. We will define Markov chains, illustrate how random walks can be viewed as Markov chains, and discuss key concepts such as stationary distributions and the Basic Limit Theorem for Markov Chains. This theorem explains the long-term behavior of Markov chains and their convergence to stationary distributions, providing insights into the long-run probabilities of states in these systems.
By the end of this lecture, you will gain a comprehensive understanding of random walks, SAT problems, and Markov chains, and appreciate their interconnections and applications in various fields.
Random Walks and Recurrence
Random Walks on Grids
A random walk is a stochastic process describing a sequence of steps in a mathematical space. We focus on simple random walks on a \(d\)-dimensional integer grid \(\mathbb{Z}^d\). Imagine starting at the origin. At each step, the walker moves to one of its \(2d\) nearest neighbors with equal probability \(\frac{1}{2d}\). A central question in the study of random walks is whether the walker will return to its starting point, and if so, how often. This concept is known as recurrence.
Theorem 1 (Pólya’s Recurrence Theorem (1921)).
This theorem describes the recurrence of simple random walks on a \(d\)-dimensional integer grid. For a simple random walk on a \(d\)-dimensional integer grid \(\mathbb{Z}^d\):
If \(d = 1\) or \(d = 2\), the random walk is recurrent. This means that with probability 1, the random walk will return to its starting point infinitely often.
If \(d \geq 3\), the random walk is transient. This means that with probability greater than zero, the random walk will never return to its starting point after some finite time. Equivalently, the probability of returning to the origin is less than 1, and the expected number of returns to the origin is finite.
Pólya’s Recurrence Theorem is a remarkable result that reveals how the dimensionality of the space profoundly affects the long-term behavior of random walks. In essence, it tells us that in lower dimensions (1D and 2D), the walker is "confined" enough to guarantee its return to the origin, whereas in higher dimensions (3D and above), the walker has enough "room" to wander away indefinitely.
Example 2.
This example provides an intuitive analogy to understand the concept of recurrence in different dimensions using the spread of a scent. Consider a simple analogy to understand this concept intuitively. Imagine releasing a scent in a space.
In one dimension (a line), the scent, represented by the random walker, is likely to return to any given point as it is confined to move back and forth along the line.
In two dimensions (a plane), the scent can spread out, but there’s still a high probability it will revisit areas close to the origin. Think of it spreading on a flat surface; it will likely double back.
In three dimensions (space), the scent disperses in all directions. The probability of it returning to a specific point near the origin diminishes significantly as it can explore a much larger volume.
In a 2-dimensional grid, if you start at any point, you are certain to return to that point infinitely many times. However, in a 3-dimensional grid or higher, there’s a genuine chance you will wander off and never return to the starting location. This difference is not just a matter of probability; it’s a fundamental change in the qualitative behavior of the random walk.
For those interested in delving deeper into the mathematical intricacies of random walks, particularly on infinite graphs, the works of Doyle and Snell [25] and Thomassen [65] offer valuable insights.
Propositional Calculus and SAT
Now, we transition from the spatial domain of random walks to the logical realm of propositional calculus and the Boolean Satisfiability problem (SAT). Propositional calculus is the bedrock of formal logic, providing a system to represent and reason about logical statements.
Propositional Variables and Logical Connectives
Definition 3 (Propositional Variables).
This definition introduces propositional variables, the basic components of propositional logic, which can be either true or false. Propositional variables, also known as Boolean variables, are the basic building blocks of propositional logic. They are symbols, typically denoted by uppercase letters such as \(P, Q, R, \ldots\), that can represent simple declarative statements. Each propositional variable can be assigned one of two truth values: true or false.
Think of propositional variables as switches that can be either ON (true) or OFF (false). They represent elementary propositions whose truth value we want to analyze.
Definition 4 (Logical Connectives).
This definition describes the fundamental logical connectives used to combine propositional variables into complex formulas. Logical connectives are operators used to combine propositional variables and form more complex compound propositions or formulas. The fundamental logical connectives we will consider are:
Negation (\(\neg\) or \(\lnot\)): The negation connective, often read as "NOT", reverses the truth value of a proposition. If \(P\) is true, then \(\neg P\) is false, and if \(P\) is false, then \(\neg P\) is true.
Conjunction (\(\land\)): The conjunction connective, read as "AND", combines two propositions. The formula \(P \land Q\) is true if and only if both \(P\) and \(Q\) are true. Otherwise, it is false.
Disjunction (\(\lor\)): The disjunction connective, read as "OR", combines two propositions. The formula \(P \lor Q\) is true if and only if at least one of \(P\) or \(Q\) is true (or both are true). It is false only when both \(P\) and \(Q\) are false. This is inclusive OR.
These connectives allow us to construct complex logical statements from simpler ones, mirroring how we build complex sentences from basic words and grammatical structures.
Formulas and Semantics
Definition 5 (Formulas).
This definition formally defines propositional formulas through a recursive construction using propositional variables and logical connectives. Formulas in propositional calculus are well-formed expressions constructed recursively from propositional variables and logical connectives according to specific rules:
Base case: A propositional variable alone is a formula. For example, \(P\), \(Q\), \(R\) are formulas.
Recursive step: If \(\varphi\) and \(\psi\) are formulas, then:
Negation: \(\neg \varphi\) is a formula.
Conjunction: \((\varphi \land \psi)\) is a formula.
Disjunction: \((\varphi \lor \psi)\) is a formula.
Closure: Nothing else is a formula unless constructed by these rules.
Parentheses are used to ensure the unambiguous parsing and interpretation of formulas, especially when combining multiple connectives. For instance, \((P \land Q) \lor \neg R\) is a valid propositional formula, clearly indicating the order of operations.
Definition 6 (Semantics and Truth Assignment).
This definition explains how to assign meaning to propositional formulas by defining truth assignments and how they evaluate formulas. Semantics in propositional calculus deals with assigning meaning to formulas by determining their truth values. A truth assignment is a function \(\mathcal{I}\) that maps each propositional variable to a truth value from the set \(\{0, 1\}\), where 1 represents true and 0 represents false. Formally, \(\mathcal{I}: \{P, Q, R, \ldots\} \rightarrow \{0, 1\}\). This truth assignment is then extended to evaluate the truth value of complex formulas recursively, based on the truth tables of the logical connectives.
For a given truth assignment \(\mathcal{I}\) and formulas \(\varphi\) and \(\psi\):
\(\mathcal{I}(\neg \varphi) = 1 - \mathcal{I}(\varphi)\) (Negation: true if \(\varphi\) is false, false if \(\varphi\) is true)
\(\mathcal{I}(\varphi \land \psi) = \min(\mathcal{I}(\varphi), \mathcal{I}(\psi))\) (Conjunction: true if both \(\varphi\) and \(\psi\) are true, false otherwise)
\(\mathcal{I}(\varphi \lor \psi) = \max(\mathcal{I}(\varphi), \mathcal{I}(\psi))\) (Disjunction: true if at least one of \(\varphi\) or \(\psi\) is true, false otherwise)
Example 7.
This example demonstrates how to evaluate a propositional formula under a given truth assignment. Let’s evaluate the formula \((P \land Q) \lor \neg R\) under a truth assignment \(\mathcal{I}\) where \(\mathcal{I}(P) = 1\) (true), \(\mathcal{I}(Q) = 0\) (false), and \(\mathcal{I}(R) = 1\) (true).
Evaluate \(P \land Q\): \(\mathcal{I}(P \land Q) = \min(\mathcal{I}(P), \mathcal{I}(Q)) = \min(1, 0) = 0\) (false).
Evaluate \(\neg R\): \(\mathcal{I}(\neg R) = 1 - \mathcal{I}(R) = 1 - 1 = 0\) (false).
Evaluate \((P \land Q) \lor \neg R\): \(\mathcal{I}((P \land Q) \lor \neg R) = \max(\mathcal{I}(P \land Q), \mathcal{I}(\neg R)) = \max(0, 0) = 0\) (false).
Therefore, under the truth assignment \(\mathcal{I}\), the formula \((P \land Q) \lor \neg R\) is evaluated to false.
The Satisfiability Problem (SAT)
Definition 8 (Satisfiability Problem (SAT)).
This definition introduces the central problem of Boolean Satisfiability (SAT), asking whether there exists a truth assignment that makes a given formula true. Given a propositional formula \(\varphi\), the central question in the Satisfiability problem (SAT) is: Does there exist at least one truth assignment \(\mathcal{I}\) that makes the formula \(\varphi\) true?
If such a truth assignment exists, the formula \(\varphi\) is said to be satisfiable. Otherwise, if no such assignment exists, meaning for all possible truth assignments, \(\varphi\) evaluates to false, then \(\varphi\) is said to be unsatisfiable.
SAT is a cornerstone problem in computer science due to its fundamental nature and wide applicability. Determining the satisfiability of Boolean formulas is crucial in various domains, including:
Verification: Checking the correctness of hardware and software systems.
Artificial Intelligence: Reasoning, planning, and knowledge representation.
Optimization: Solving constraint satisfaction problems and combinatorial optimization problems.
Conjunctive Normal Form (CNF)
To standardize the form of propositional formulas for SAT problem analysis and algorithm design, Conjunctive Normal Form (CNF) is particularly important.
Definition 9 (Conjunctive Normal Form (CNF)).
This definition introduces Conjunctive Normal Form (CNF), a standard format for propositional formulas as a conjunction of clauses. A propositional formula is in Conjunctive Normal Form (CNF) if it is expressed as a conjunction (AND) of one or more clauses, where each clause is a disjunction (OR) of one or more literals. A literal is either a propositional variable or its negation.
The general structure of a CNF formula is:where each clause \(C_i\) has the form:and each \(l_{i,j}\) is a literal (either a variable \(x\) or its negation \(\neg x\)).
Example 10.
This example shows a formula in Conjunctive Normal Form (CNF) and explains why it meets the CNF criteria. Consider the formula: \((P \lor \neg Q \lor R) \land (\neg P \lor Q) \land (\neg R)\). This formula is in CNF because:
It is a conjunction of three clauses: \(C_1 = (P \lor \neg Q \lor R)\), \(C_2 = (\neg P \lor Q)\), and \(C_3 = (\neg R)\).
Each clause is a disjunction of literals. For example, in \(C_1\), \(P\), \(\neg Q\), and \(R\) are literals.
Definition 11 (CNF-SAT and k-SAT).
This definition introduces CNF-SAT, the SAT problem restricted to CNF formulas, and k-SAT, a further restriction where each clause has at most k literals. CNF-SAT is the SAT problem restricted to propositional formulas that are already in Conjunctive Normal Form. A fundamental result in computational complexity theory states that SAT is equivalent to CNF-SAT. This means that any SAT problem for an arbitrary propositional formula can be transformed into an equivalent CNF-SAT problem. This equivalence is crucial because many SAT solvers and theoretical analyses focus on CNF formulas.
k-SAT is a further restricted version of CNF-SAT where each clause is limited to containing at most \(k\) literals. Important specific cases of k-SAT are:
2-SAT: Each clause contains at most 2 literals.
3-SAT: Each clause contains at most 3 literals.
k-SAT and Complexity
The complexity of solving SAT and its k-SAT variants varies significantly with the value of \(k\). This variation is crucial in understanding the boundary between tractable and intractable computational problems.
Complexity of SAT Variants:
SAT and 3-SAT are NP-complete problems. This means that unless P=NP, there are no known polynomial-time algorithms to solve them. NP-completeness implies that these problems are among the hardest problems in NP, and solving them efficiently for large instances is computationally infeasible in the worst case.
2-SAT is solvable in polynomial time. In contrast to 3-SAT, 2-SAT can be solved efficiently. Polynomial-time algorithms exist for 2-SAT, making it a tractable special case of the SAT problem. This difference in complexity between 2-SAT and 3-SAT is a critical point in computational complexity theory.
While any propositional formula can be converted into an equivalent CNF formula in polynomial time, the conversion to Disjunctive Normal Form (DNF), which is a disjunction of conjunctions of literals, can lead to an exponential increase in formula size. This makes CNF a more suitable form for many SAT algorithms and theoretical analyses, as it avoids potential exponential blow-up during conversion, unlike DNF. For example, consider converting the formula \((P_1 \land Q_1) \lor (P_2 \land Q_2) \lor \ldots \lor (P_n \land Q_n)\) to DNF and CNF. It is already in DNF and has size \(O(n)\). Converting it to CNF using distributive laws will result in a formula of size \(O(2^n)\).
Algorithms for 2-SAT
Since 2-SAT is polynomial-time solvable, we can explore efficient algorithms for it. We will discuss the resolution rule, a polynomial-time algorithm based on implication graphs, and a randomized algorithm for 2-SAT.
Resolution Rule and Unsatisfiability in 2-CNF
The resolution rule is a fundamental inference rule in propositional logic, particularly useful for formulas in CNF. In the context of 2-CNF, it provides a method to derive new clauses from existing ones, and crucially, to detect unsatisfiability.
Definition 12 (Resolution Rule for 2-CNF).
This definition introduces the resolution rule for 2-CNF clauses, a method to derive new clauses from existing ones. Given two clauses \(C_1 = (l_1 \lor l_2)\) and \(C_2 = (\neg l_1 \lor l_3)\), where \(l_1, l_2, l_3\) are literals, we can derive a new clause called the resolvent \(C_{res} = (l_2 \lor l_3)\). The literal \(l_1\) is called the pivot literal.
The resolution rule works by identifying clauses that contain complementary literals (like \(l_1\) and \(\neg l_1\)). It then combines these clauses to eliminate the pivot literal, potentially leading to simpler clauses or revealing contradictions.
Example 13.
This example demonstrates the application of the resolution rule to derive a new clause from two given clauses. Consider the clauses \((P \lor Q)\) and \((\neg P \lor R)\). Applying the resolution rule with \(P\) as the pivot literal, we derive the resolvent clause \((Q \lor R)\).
A key property of the resolution rule for 2-CNF is its completeness for refutation. This means that if a 2-CNF formula is unsatisfiable, we can always derive the empty clause (denoted as \(\Box\), a clause with no literals) by repeatedly applying the resolution rule. The empty clause represents a contradiction, as it cannot be satisfied by any truth assignment.
Theorem 14 (Unsatisfiability by Resolution).
This theorem states that a 2-CNF formula is unsatisfiable if and only if the empty clause can be derived using the resolution rule. A 2-CNF formula is unsatisfiable if and only if the empty clause \(\Box\) can be derived by repeatedly applying the resolution rule.
To check for unsatisfiability using resolution, we can iteratively apply the resolution rule to all possible pairs of clauses in the formula. If we derive the empty clause at any point, we can conclude that the formula is unsatisfiable. If we exhaust all possible resolutions and do not derive the empty clause, and no new clauses can be derived, then the formula is satisfiable.
Polynomial-Time Algorithm for 2-SAT using Implication Graphs
A more efficient polynomial-time algorithm for 2-SAT leverages the concept of implication graphs. This graph-based approach allows us to determine satisfiability and find a satisfying assignment (if one exists) in linear time with respect to the size of the formula.
Definition 15 (Implication Graph).
This definition describes how to construct an implication graph from a 2-CNF formula, representing implications between literals as directed edges. For a given 2-CNF formula \(\varphi\), the implication graph \(G_{\varphi} = (V, E)\) is a directed graph constructed as follows:
Vertices \(V\): For each propositional variable \(x\) in \(\varphi\), the vertex set \(V\) contains two vertices, representing the literals \(x\) and \(\neg x\).
Edges \(E\): For each clause \((l_1 \lor l_2)\) in \(\varphi\), we add two directed edges to \(E\):
\((\neg l_1 \rightarrow l_2)\): representing the implication \(\neg l_1 \implies l_2\) (if \(l_1\) is false, then \(l_2\) must be true).
\((\neg l_2 \rightarrow l_1)\): representing the implication \(\neg l_2 \implies l_1\) (if \(l_2\) is false, then \(l_1\) must be true).
For a clause \((l_1 \lor l_2)\), if \(l_1\) is false, then \(l_2\) must be true to satisfy the clause, and vice versa. These implications are precisely what the directed edges in the implication graph represent.
The key insight for using implication graphs to solve 2-SAT is the following theorem:
Theorem 16 (Unsatisfiability Condition in Implication Graphs).
This theorem provides the condition for unsatisfiability of a 2-CNF formula based on paths in its implication graph. A 2-CNF formula \(\varphi\) is unsatisfiable if and only if there exists a propositional variable \(x\) such that there is a path from \(x\) to \(\neg x\) and a path from \(\neg x\) to \(x\) in the implication graph \(G_{\varphi}\).
This theorem provides a direct method to check for unsatisfiability. The existence of paths from \(x\) to \(\neg x\) and from \(\neg x\) to \(x\) implies a contradiction: if \(x\) is true, then \(\neg x\) must be true, and if \(\neg x\) is true, then \(x\) must be true, which is impossible.
To implement a polynomial-time algorithm based on this theorem, we can use the concept of Strongly Connected Components (SCCs). We compute the SCCs of the implication graph. If any variable \(x\) and its negation \(\neg x\) belong to the same SCC, then there is a cycle containing both \(x\) and \(\neg x\), implying paths from \(x\) to \(\neg x\) and \(\neg x\) to \(x\).
Algorithm 17 (ht).
This algorithm checks the satisfiability of a 2-CNF formula using implication graphs and strongly connected components.
-CNF formula \(\varphi\) "Satisfiable" or "Unsatisfiable" Construct the implication graph \(G_{\varphi} = (V, E)\) for \(\varphi\). Compute the Strongly Connected Components (SCCs) of \(G_{\varphi}\). For each propositional variable \(x\) in \(\varphi\): "Unsatisfiable" "Satisfiable"
Complexity Analysis:
Constructing the implication graph takes \(O(m)\) time, where \(m\) is the number of clauses in \(\varphi\), as we add a constant number of edges for each clause.
Computing Strongly Connected Components (SCCs) using algorithms like Kosaraju’s algorithm or Tarjan’s algorithm takes \(O(|V| + |E|)\) time, where \(|V|\) is the number of vertices and \(|E|\) is the number of edges. In our case, \(|V| = 2n\) (where \(n\) is the number of variables) and \(|E| \leq 2m\). Thus, SCC computation is \(O(n + m)\).
Checking for each variable \(x\) if \(x\) and \(\neg x\) are in the same SCC takes \(O(n)\) time in total.
Therefore, the overall time complexity of this algorithm is \(O(m + n)\), which is linear in the size of the input formula, making it a highly efficient deterministic algorithm for 2-SAT.
If the algorithm determines that the formula is satisfiable, a satisfying assignment can also be extracted from the SCCs. For each variable \(x\), if the SCC of \(x\) comes before the SCC of \(\neg x\) in a topological sort of the SCC graph, we assign \(x\) to true; otherwise, we assign \(x\) to false.
Randomized Algorithm for 2-SAT
While deterministic polynomial-time algorithms exist for 2-SAT, randomized algorithms offer an alternative approach that can be simpler to implement and sometimes perform well in practice, especially for certain types of SAT instances.
Consider the following randomized algorithm for 2-SAT:
Algorithm 18 (ht).
This algorithm is a randomized approach to solve 2-SAT by iteratively flipping truth values of variables in unsatisfied clauses.
-CNF formula \(\varphi\) with \(n\) variables "Satisfiable" or "Unsatisfiable" (Timeout) Initialize a random truth assignment for all variables in \(\varphi\). Set \(max\_steps = 2n^2\). "Satisfiable" Select an arbitrary unsatisfied clause \(C\) in \(\varphi\). Select a literal \(l\) from \(C\) uniformly at random. Flip the truth value of the variable in \(l\). "Unsatisfiable" (Timeout)
Algorithm Explanation:
Initialization: Start with a random assignment of truth values to all propositional variables. This is our initial guess for a satisfying assignment.
Iteration: Repeat for a maximum of \(2n^2\) steps (where \(n\) is the number of variables):
Check for Satisfaction: If the current truth assignment satisfies all clauses in \(\varphi\), we have found a satisfying assignment, and the algorithm returns "Satisfiable".
Clause Selection: If there are unsatisfied clauses, pick any one of them. The choice of clause can be arbitrary.
Literal Selection and Flip: From the chosen unsatisfied clause \(C\), randomly select one of its literals \(l\) with uniform probability. Flip the truth value of the variable associated with the literal \(l\). This means if \(l\) is a positive literal \(x\), change the truth value of \(x\); if \(l\) is a negative literal \(\neg x\), change the truth value of \(x\).
Timeout: If the algorithm reaches the maximum number of steps (\(2n^2\)) without finding a satisfying assignment, it declares "Unsatisfiable" (or more accurately, "Timeout").
Probabilistic Nature and One-Sided Error: This randomized algorithm is not guaranteed to find a satisfying assignment even if one exists. However, for 2-SAT, it has a good probability of finding a solution within \(2n^2\) steps if the formula is indeed satisfiable. If the algorithm returns "Satisfiable", the returned truth assignment is guaranteed to be satisfying. However, if it returns "Unsatisfiable", it might be incorrect, meaning the formula could still be satisfiable, but the algorithm simply timed out before finding a solution. Thus, this is a one-sided error algorithm.
Error Reduction: To reduce the probability of error (i.e., incorrectly reporting "Unsatisfiable" for a satisfiable formula), we can:
Increase \(max\_steps\): Running the algorithm for more steps increases the chance of finding a satisfying assignment if one exists. For 2-SAT, setting \(max\_steps = 100n^2\) significantly reduces the error probability to \(\leq \frac{1}{100}\).
Restart the algorithm: Repeat the entire randomized algorithm multiple times with different random initial truth assignments. If we run the algorithm \(k\) times and it still outputs "Unsatisfiable" each time, the probability of error decreases exponentially with \(k\).
Complexity Analysis: In each step of the randomized algorithm, we perform a constant amount of work (selecting a clause, selecting a literal, flipping a truth value). The algorithm runs for at most \(2n^2\) steps. Checking if the formula is satisfied in each step can take \(O(m \cdot n)\) in a naive implementation, where \(m\) is the number of clauses and \(n\) is the number of variables. However, with more efficient data structures, this check can be optimized. The overall complexity is roughly \(O(n^2 \cdot (\text{cost of satisfaction check}))\). While not strictly linear like the implication graph algorithm, it is still polynomial and often effective in practice for 2-SAT.
The effectiveness of this randomized algorithm stems from the observation that for 2-SAT, each flip of a variable’s truth value has a reasonable probability of moving the current assignment "closer" to a satisfying assignment, if one exists. This probabilistic improvement in each step makes the randomized approach a viable alternative to deterministic methods for 2-SAT.
Expected Values and Markov’s Inequality
To analyze randomized algorithms, especially in terms of their performance and probability of success, we often rely on tools from probability theory. Expected values and inequalities like Markov’s Inequality are particularly useful in this context. These tools help us understand the average-case behavior and probabilistic guarantees of randomized algorithms, such as the randomized 2-SAT algorithm we discussed.
Expected Number of Steps
When we analyze randomized algorithms, we are often interested in how long it takes for them to find a solution, or to reach a desired state. For algorithms that proceed in steps, a key metric is the expected number of steps until the algorithm terminates successfully.
In the context of the randomized 2-SAT algorithm, we want to understand, on average, how many iterations of flipping truth values are needed to find a satisfying assignment, assuming one exists. This concept is closely related to ideas from the study of random walks on graphs, such as hitting time and cover time.
Definition 19 (Hitting Time).
This definition introduces the concept of hitting time in stochastic processes, representing the expected steps to reach a set of states for the first time. In the context of stochastic processes and random walks, the hitting time (or first passage time) to a set of states \(B\) is the expected number of steps required to reacha state in \(B\) for the first time, starting from a given initial state.
In our randomized 2-SAT algorithm, we can think of the space of all possible truth assignments as the state space. A "target" set of states would be the set of all truth assignments that satisfy the given 2-CNF formula. The hitting time, in this case, would be the expected number of steps (flips) to reach a satisfying truth assignment for the first time, starting from a random initial assignment.
Definition 20 (Cover Time).
This definition introduces cover time, representing the expected steps for a random walk to visit all vertices in a graph. The cover time of a graph is the expected number of steps needed for a random walk, starting from a given vertex, to visit all vertices in the graph at least once.
While cover time is less directly applicable to the randomized 2-SAT algorithm, the concept of hitting time is highly relevant. Analyzing the expected number of steps to find a satisfying assignment can be complex. However, understanding the expected behavior is crucial for designing and evaluating randomized algorithms. Markov chain theory, which we will discuss later, provides a powerful framework for analyzing such stochastic processes and their expected behavior.
Markov’s Inequality
Markov’s Inequality is a fundamental result in probability theory that provides a way to bound the probability that a non-negative random variable exceeds a certain value, using only its expected value. It is a versatile tool, especially when we have limited information about the distribution of a random variable, but we know its expectation.
Theorem 21 (Markov’s Inequality).
This theorem presents Markov’s Inequality, which bounds the probability of a non-negative random variable exceeding a value based on its expected value. Let \(X\) be a non-negative random variable. Then for any real number \(a > 0\), \[P(X \geq a) \leq \frac{E[X]}{a}\] where \(E[X]\) is the expected value of \(X\).
Conditions and Interpretation:
Non-negativity: Markov’s Inequality applies only to non-negative random variables. This is because it leverages the fact that a non-negative variable cannot take negative values, which constrains its distribution.
Bound on Probability: The inequality provides an upper bound on the probability that the random variable \(X\) is greater than or equal to a positive value \(a\). It tells us that this probability cannot be arbitrarily large if the expected value \(E[X]\) is controlled.
Looseness of Bound: Markov’s Inequality is often described as a "loose" bound because it can be far from tight in many cases. It only uses information about the expected value and does not consider other properties of the distribution (like variance, shape, etc.). Despite its looseness, it is remarkably useful due to its generality and simplicity.
Example 22 (Application of Markov’s Inequality to Randomized 2-SAT).
This example demonstrates how Markov’s Inequality can be applied to analyze the randomized 2-SAT algorithm, bounding the probability of exceeding a certain number of steps. Let \(X\) be a random variable representing the number of steps taken by the randomized 2-SAT algorithm to find a satisfying assignment (if one exists). Suppose, through some analysis (which is beyond the scope of this lecture but can be done), we determine that the expected number of steps to find a solution is \(E[X] = n^2\), where \(n\) is the number of variables in the 2-SAT formula.
We want to estimate the probability that the algorithm takes more than \(a = 2n^2\) steps. Using Markov’s Inequality, we can calculate an upper bound for this probability: \[P(X \geq 2n^2) \leq \frac{E[X]}{2n^2} = \frac{n^2}{2n^2} = \frac{1}{2}\] This result tells us that there is at most a 50
If we want to further reduce the probability of exceeding a certain number of steps, we can increase the limit \(a\). For instance, if we set \(a = 10n^2\), then: \[P(X \geq 10n^2) \leq \frac{E[X]}{10n^2} = \frac{n^2}{10n^2} = \frac{1}{10}\] This shows that there is at most a 10
Markov’s Inequality provides a way to make probabilistic statements about the performance of randomized algorithms, even when we don’t have a detailed understanding of the distribution of their runtime. In the context of the randomized 2-SAT algorithm, it helps us to bound the probability of "failure" in terms of exceeding a certain number of steps, given the expected number of steps. As noted in the transcript, if we set a maximum step limit \(a\), \(P(X \geq a) \leq \frac{E[X]}{a}\) gives us a bound on the probability of the algorithm "making a mistake" by not finding a solution within \(a\) steps, even if one exists.
Markov Chains
Markov chains are a fundamental class of stochastic processes that model systems evolving through a sequence of states over time. Their defining characteristic is the Markov property, which asserts that the future state depends only on the current state, and not on the sequence of past states. This memoryless property simplifies the analysis and makes Markov chains a powerful tool for modeling various phenomena.
Definition of a Markov Chain
Definition 23 (Markov Chain).
This definition formally introduces Markov Chains, defining them as stochastic processes with the Markov property, where future states depend only on the current state. A Markov chain is a sequence of random variables \(X_0, X_1, X_2, \ldots\), where each \(X_n\) takes a value in a countable set \(S\), called the state space. The sequence satisfies the Markov property if for any time \(n\) and any states \(i_0, i_1, \ldots, i_{n-1}, i, j \in S\), the conditional probability of transitioning to state \(j\) at time \(n+1\), given the entire history up to time \(n\), depends only on the state at time \(n\): \[P(X_{n+1} = j \mid X_n = i, X_{n-1} = i_{n-1}, \ldots, X_0 = i_0) = P(X_{n+1} = j \mid X_n = i) = P_{ij}\] The probabilities \(P_{ij} = P(X_{n+1} = j \mid X_n = i)\) are called the transition probabilities from state \(i\) to state \(j\). The matrix \(P = (P_{ij})\), where \(i, j \in S\), is known as the transition matrix of the Markov chain.
For a Markov chain to be well-defined, the transition probabilities must satisfy two basic properties:
Non-negativity: \(P_{ij} \geq 0\) for all states \(i, j \in S\). Probabilities cannot be negative.
Stochastic Property (Row Sums are 1): \(\sum_{j \in S} P_{ij} = 1\) for all states \(i \in S\). From any state \(i\), the chain must transition to some state in the state space with probability 1.
A matrix satisfying these properties is called a stochastic matrix. Every stochastic matrix can serve as the transition matrix for a Markov chain.
Random Walks as Markov Chains
Random walks, which we discussed earlier, are a specific and important class of Markov chains. In fact, any random walk can be modeled as a Markov chain. The states of the Markov chain correspond to the positions the walker can occupy in the random walk space, and the transitions are defined by the possible moves of the walker and their associated probabilities.
Consider a random walk on a graph \(G = (V, E)\). The state space \(S\) of the Markov chain is the set of vertices \(V\) of the graph. If the random walk is currently at vertex \(u \in V\), it transitions to a neighboring vertex \(v \in V\) with a certain probability \(P_{uv}\). If there is no edge between \(u\) and \(v\), then \(P_{uv} = 0\). The sum of transition probabilities from any vertex \(u\) to all its neighbors (and possibly to itself, if staying in place is allowed) must be 1, satisfying the stochastic property.
Example 24 (Random Walk on a Clock as a Markov Chain).
This example illustrates how a random walk on a clock can be modeled as a Markov chain, defining states and transition probabilities. Let’s revisit the example of a random walk on a clock with 6 states \(\{0, 1, 2, 3, 4, 5\}\). From each state \(i\), the walker can move to state \((i-1) \pmod 6\) (counter-clockwise), state \(i\) (stay in place), or state \((i+1) \pmod 6\) (clockwise), each with probability 1/3.
The transition matrix \(P\) for this Markov chain is given by: \[P = \begin{pmatrix} 1/3 & 1/3 & 0 & 0 & 0 & 1/3 \\ 1/3 & 1/3 & 1/3 & 0 & 0 & 0 \\ 0 & 1/3 & 1/3 & 1/3 & 0 & 0 \\ 0 & 0 & 1/3 & 1/3 & 1/3 & 0 \\ 0 & 0 & 0 & 1/3 & 1/3 & 1/3 \\ 1/3 & 0 & 0 & 0 & 1/3 & 1/3\end{pmatrix}\] Each row in this matrix sums to 1, and all entries are non-negative, confirming it is a stochastic matrix and a valid transition matrix for a Markov chain. This example clearly illustrates how a random walk can be represented as a Markov chain, with the Markov property holding because the next state depends only on the current state, and not on how the walker arrived at the current state.
Stationary Distribution and Basic Limit Theorem
A crucial concept in Markov chains is the stationary distribution, which describes the long-term probability distribution of states.
Definition 25 (Stationary Distribution).
This definition introduces the stationary distribution for a Markov chain, a probability distribution that remains unchanged after one step of the chain. A probability distribution \(\pi = (\pi_i)_{i \in S}\) is a stationary distribution for a Markov chain with transition matrix \(P\) if \(\pi P = \pi\), i.e., for all \(j \in S\), \[\pi_j = \sum_{i \in S} \pi_i P_{ij}\] In vector notation, if \(\pi\) is a row vector, then \(\pi P = \pi\).
The stationary distribution represents a distribution that, once reached, remains unchanged over time when the Markov chain transitions according to \(P\).
Theorem 26 (Basic Limit Theorem for Markov Chains).
This theorem describes the long-term behavior of irreducible and aperiodic Markov chains, stating that the probability distribution converges to the stationary distribution. Let \(X_0, X_1, \ldots\) be an irreducible, aperiodic Markov chain with a stationary distribution \(\pi\). Let \(X_0\) have an arbitrary initial distribution \(\pi_0\). Let \(\pi_n(i) = P(X_n = i)\). Then, for all states \(i\), \[\lim_{n \rightarrow \infty} \pi_n(i) = \pi(i)\]
For this theorem to hold, two conditions are necessary:
Definition 27 (Irreducibility).
This definition introduces irreducibility for Markov chains, meaning it’s possible to reach any state from any other state. A Markov chain is irreducible if it is possible to reach any state from any other state in a finite number of steps.
Definition 28 (Aperiodicity).
This definition introduces aperiodicity for Markov chains, a condition on the cycle lengths in the state transitions. A Markov chain is aperiodic if the greatest common divisor of the lengths of all cycles starting and ending at any state is 1.
The Basic Limit Theorem explains why, under certain conditions (irreducibility and aperiodicity), the probability of being in a particular state in a random walk, in the long run, approaches a fixed value, regardless of the starting state. This long-term behavior is described by the stationary distribution.
Conclusion
In this lecture, we have explored the interconnections between random walks, satisfiability problems, and Markov chains. We started with Pólya’s Recurrence Theorem, which revealed the recurrence properties of random walks on grids. We then transitioned to propositional calculus and the SAT problem, discussing its complexity and focusing on the polynomial-time solvable 2-SAT variant. We examined algorithms for 2-SAT, including resolution-based and randomized approaches, and introduced expected values and Markov’s Inequality as tools for analyzing randomized algorithms. Finally, we formally introduced Markov chains, demonstrating how random walks can be modeled as Markov chains, and discussed stationary distributions and the Basic Limit Theorem, which describes the long-term behavior of Markov chains. These concepts are fundamental and have broad applications in computer science, probability, and statistical physics.
Key takeaways from this lecture include:
Pólya’s Recurrence Theorem and the dimensional dependency of random walk recurrence.
The definition and complexity of the SAT problem and its 2-SAT and 3-SAT variants.
Polynomial-time and randomized algorithms for solving 2-SAT.
The application of Markov’s Inequality in analyzing randomized algorithms.
The definition and properties of Markov chains and their connection to random walks.
The concept of stationary distributions and the Basic Limit Theorem for Markov Chains, describing long-term behavior.
Further exploration of these topics could involve investigating more advanced algorithms for SAT, delving deeper into Markov chain theory and its applications in various fields, or studying the connections between random walks and other areas of mathematics and computer science. For the next lecture, it would be beneficial to consider exploring specific applications of Markov Chains in areas like queuing theory or PageRank algorithm, or delve into more complex types of random walks and their properties.
Exercises
Prove that any propositional formula can be converted into an equivalent CNF formula. (Hint: Use logical equivalences to eliminate implications and biconditionals, move negations inwards using De Morgan’s laws, and apply distributive laws to obtain CNF).
Design a deterministic polynomial-time algorithm for 2-SAT based on the implication graph approach. Detail the steps for constructing the implication graph, identifying strongly connected components, and determining satisfiability.
Consider a random walk on a line segment with states \(\{0, 1, 2, 3, 4\}\). From states 1, 2, 3, you can move to the left or right with probability 1/2. From state 0, you move to state 1 with probability 1. From state 4, you move to state 3 with probability 1. Construct the transition matrix for this Markov chain. Is it irreducible? Is it aperiodic?
Calculate the stationary distribution for the random walk on a clock example discussed in the lecture. Verify that \(\pi P = \pi\) for your calculated distribution.
Consider the 3-SAT problem. Explain why the randomized algorithm for 2-SAT might not be as effective for 3-SAT. What are the challenges in adapting randomized approaches for NP-complete problems like 3-SAT?
(From handwritten notes) Demonstrate known algorithms for converting formulas to Disjunctive Normal Form (DNF) and explain why these algorithms can lead to exponential size increases.
(From handwritten notes) Exercise related to hitting time and cover time:
Calculate the expected number of steps to go from vertex \(u\) to vertex \(v\) in a simple graph. (Hitting Time)
Calculate the expected number of steps to visit all vertices in a simple graph. (Cover Time)