Randomized Algorithms: QuickSort and Min-Cut

Author

Your Name

Published

February 8, 2025

Introduction

Randomized algorithms are a class of algorithms that utilize randomness as part of their logic. Unlike deterministic algorithms, which always produce the same output for a given input, randomized algorithms incorporate random choices, typically through the use of random bits, to guide their computation. This introduction to randomized algorithms will explore their fundamental concepts, advantages, and applications, focusing on examples like Randomized QuickSort and the Min-Cut problem.

Definition of Randomized Algorithm. As defined by Richard M. Karp in his introduction to randomized algorithms, a randomized algorithm is one that receives, in addition to its input data, a stream of random bits that it can use for the purpose of making random choices during its execution. This contrasts with deterministic algorithms, where every step is predetermined by the input.

The introduction of randomness into algorithm design offers several compelling advantages:

Simplicity: For certain problems, randomized algorithms are surprisingly simpler to design and implement than their deterministic counterparts. The use of randomness can sometimes bypass complex case analysis or intricate data structures required by deterministic approaches.
Efficiency: Randomized algorithms often provide significant efficiency gains, particularly in terms of average-case performance. For some problems, the best-known deterministic algorithms are outperformed by randomized algorithms in expectation, and sometimes even in terms of probabilistic worst-case guarantees.
Robustness: Randomization can effectively mitigate the risk of encountering worst-case input scenarios that are specifically designed to degrade the performance of deterministic algorithms. By making random choices, the algorithm’s performance becomes less dependent on the structure of the input, leading to more consistent behavior across different inputs.

It is crucial to understand that due to the incorporation of randomness, repeated executions of a randomized algorithm on the same input may yield different results or follow different computational paths. However, the power of randomized algorithms lies in the fact that they provide guarantees on their performance in expectation or with high probability.

The use of randomness can be viewed as a valuable computational resource. Instead of solely relying on the input data, randomized algorithms strategically leverage randomness to explore a wider range of computational possibilities. This exploration allows them to often find efficient solutions to problems for which deterministic approaches are either too complex or computationally expensive. In essence, randomness becomes a tool to navigate complexity and achieve efficiency in computation.

Randomized QuickSort ()

Algorithm Description

QuickSort is a well-known sorting algorithm based on the divide-and-conquer paradigm. Randomized QuickSort () is a randomized variant that aims to achieve consistently good average-case performance, regardless of the input data distribution. The fundamental principle of divide and conquer is maintained, but the crucial step of pivot selection is randomized to avoid worst-case scenarios.

Choose Pivot: Select an element $y$ uniformly at random from $S$. This means each element in $S$ has an equal probability of being chosen as the pivot. Partition: Compare each element of $S$ with the pivot $y$ to divide $S$ into three sets:

$S_1 = \{x \in S \mid x < y\}$: Elements smaller than $y$.
$S_y = \{x \in S \mid x = y\}$: Elements equal to $y$.
$S_2 = \{x \in S \mid x > y\}$: Elements larger than $y$.

Recursion: Recursively sort $S_1$ and $S_2$ using . Output: Combine the sorted sets as follows: sorted $S_1$, followed by $S_y$, and then sorted $S_2$.

Complexity Analysis: The expected time complexity of Randomized QuickSort is $O(n \log n)$, where $n$ is the number of elements to be sorted. The worst-case time complexity is $O(n^2)$, but this occurs with very low probability. The space complexity is $O(\log n)$ on average due to the recursive call stack, and $O(n)$ in the worst case.

In step 3 of Algorithm [alg:randqs], selecting a pivot uniformly at random from a set $S$ requires a source of randomness. On average, choosing a random element from a set of size $|S|$ can be achieved using approximately $\log_2 |S|$ random bits.

Analysis of Expected Comparisons

To rigorously analyze the efficiency of , we focus on counting the number of comparisons, as this is the dominant operation in sorting algorithms. We aim to determine the expected number of comparisons performed by when sorting a set of $n$ numbers.

Let $S = \{z_1, z_2, \ldots, z_n\}$ be the input set, and let $z_{(1)}, z_{(2)}, \ldots, z_{(n)}$ be the elements of $S$ in sorted order. We introduce indicator random variables $X_{ij}$ for each pair of indices $(i, j)$ such that $1 \leq i < j \leq n$. Let $X_{ij}$ be defined as: \[X_{ij} = \begin{cases} 1 & \text{if } z_{(i)} \text{ is compared with } z_{(j)} \text{ during the execution of \text{RandQS}} \\ 0 & \text{otherwise}\end{cases}\] The total number of comparisons $C$ performed by can be expressed as the sum of these indicator variables over all pairs $(i, j)$ with $1 \leq i < j \leq n$: \[C = \sum_{i=1}^{n} \sum_{j=i+1}^{n} X_{ij}\] Our goal is to calculate the expected number of comparisons, $\mathbb{E}[C]$. By applying the linearity of expectation, we can write: \[\mathbb{E}[C] = \mathbb{E}\left[ \sum_{i=1}^{n} \sum_{j=i+1}^{n} X_{ij} \right] = \sum_{i=1}^{n} \sum_{j=i+1}^{n} \mathbb{E}[X_{ij}]\] To compute $\mathbb{E}[C]$, we need to determine $\mathbb{E}[X_{ij}] = \mathbb{P}(z_{(i)} \text{ is compared with } z_{(j)})$.

Condition for Comparison in Randomized QuickSort. For $1 \leq i < j \leq n$, elements $z_{(i)}$ and $z_{(j)}$ are compared by if and only if one of them is the first pivot chosen from the set $\{z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}\}$ during any recursive call that contains both $z_{(i)}$ and $z_{(j)}$.

Proof. Proof. Consider the sorted subsequence $z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}$. If a pivot selected is strictly less than $z_{(i)}$ or strictly greater than $z_{(j)}$, then $z_{(i)}$ and $z_{(j)}$ will be placed in the same subarray for the recursive call. If a pivot is chosen from within the range $[z_{(i)}, z_{(j)}]$, say $z_{(k)}$ where $i \leq k \leq j$, then $z_{(i)}$ and $z_{(j)}$ will be compared to $z_{(k)}$ during the partitioning step. If the pivot is $z_{(i)}$ or $z_{(j)}$, they are directly compared to each other (and all other elements in the current subarray). However, if the first pivot chosen from the set $\{z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}\}$ is some $z_{(k)}$ where $i < k < j$, then $z_{(i)}$ will be placed in the subarray $S_1$ and $z_{(j)}$ in $S_2$ (or vice-versa), and they will never be compared in subsequent recursive calls. Thus, $z_{(i)}$ and $z_{(j)}$ are compared if and only if either $z_{(i)}$ or $z_{(j)}$ is the first pivot chosen from $\{z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}\}$. ◻

Since the pivot is chosen uniformly at random from the current subarray, when considering the set $\{z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}\}$, each element in this set has an equal probability of being the first pivot chosen from this set. There are $j-i+1$ elements in this set. The probability that $z_{(i)}$ is chosen first is $\frac{1}{j-i+1}$, and the probability that $z_{(j)}$ is chosen first is also $\frac{1}{j-i+1}$. These are mutually exclusive events. Therefore, the probability that either $z_{(i)}$ or $z_{(j)}$ is the first pivot chosen from $\{z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}\}$ is the sum of these probabilities: \[\mathbb{P}(X_{ij}) = \mathbb{E}[X_{ij}] = \frac{1}{j-i+1} + \frac{1}{j-i+1} = \frac{2}{j-i+1}\] Now we can compute the expected number of comparisons: \[\mathbb{E}[C] = \sum_{i=1}^{n} \sum_{j=i+1}^{n} \frac{2}{j-i+1} = 2 \sum_{i=1}^{n} \sum_{j=i+1}^{n} \frac{1}{j-i+1}\] Let $k = j-i+1$. For a fixed $i$, as $j$ ranges from $i+1$ to $n$, $k$ ranges from $2$ to $n-i+1$. Alternatively, we can change the order of summation. For a fixed value of $k = j-i+1$, where $k$ ranges from $2$ to $n$, the possible values for $i$ are $1, 2, \ldots, n-k+1$. Thus, for each $k$, there are $n-k+1$ pairs $(i, j)$ such that $j-i+1 = k$. \[\mathbb{E}[C] = 2 \sum_{k=2}^{n} \sum_{\substack{1 \leq i < j \leq n \\ j-i+1 = k}} \frac{1}{k} = 2 \sum_{k=2}^{n} (n-k+1) \frac{1}{k} = 2 \sum_{k=2}^{n} \left( \frac{n+1}{k} - 1 \right)\] \[\mathbb{E}[C] = 2 \left[ (n+1) \sum_{k=2}^{n} \frac{1}{k} - \sum_{k=2}^{n} 1 \right] = 2 \left[ (n+1) \left( \sum_{k=1}^{n} \frac{1}{k} - 1 \right) - (n-1) \right]\] Recall the $n$-th harmonic number $H_n = \sum_{k=1}^{n} \frac{1}{k}$. Then, \[\mathbb{E}[C] = 2 \left[ (n+1) (H_n - 1) - (n-1) \right] = 2 \left[ (n+1)H_n - (n+1) - n + 1 \right] = 2 \left[ (n+1)H_n - 2n \right] = 2(n+1)H_n - 4n\] We know that $H_n \approx \ln n + \gamma$, where $\gamma \approx 0.577$ is the Euler-Mascheroni constant. Therefore, $\mathbb{E}[C] \approx 2(n+1)(\ln n + \gamma) - 4n \approx 2n \ln n = O(n \log n)$.

Expected Comparisons of Randomized QuickSort. The expected number of comparisons performed by Randomized QuickSort on an input of size $n$ is $E[C] = 2(n+1)H_n - 4n \leq 2nH_n$. Thus, the expected time complexity of Randomized QuickSort is $O(n \log n)$.

An important feature of this analysis is that it holds for any input distribution. The randomization is within the algorithm itself (pivot selection), not in the assumption about the input. This makes Randomized QuickSort a robust algorithm in terms of average-case performance.

Worst Case vs. Average Case of QuickSort

In contrast to Randomized QuickSort, standard deterministic QuickSort, which typically selects the first or last element as a pivot, can suffer from poor performance in the worst case. For instance, if the input array is already sorted or reverse sorted and the first element is always chosen as the pivot, QuickSort degenerates to $O(n^2)$ time complexity. This happens because the partitioning step becomes highly unbalanced. In each step, if the pivot is consistently the smallest or largest element, one of the subarrays is empty, and the other contains $n-1$ elements. This leads to a recurrence relation for the number of comparisons $T(n) = T(n-1) + O(n)$, which solves to $T(n) = O(n^2)$.

In the worst-case scenario for deterministic QuickSort, the recursion tree is maximally skewed, leading to $O(n^2)$ time complexity. This occurs for specific input orderings, such as sorted or reverse sorted arrays when a fixed pivot selection strategy (like always picking the first element) is used.

Randomized QuickSort mitigates this issue by randomizing the pivot selection. By choosing a pivot uniformly at random, it makes it highly unlikely to consistently encounter the worst-case partitioning in every recursive step, regardless of the input array’s initial order. While it is still theoretically possible for Randomized QuickSort to exhibit $O(n^2)$ time complexity (if, by bad luck, pivots are consistently chosen poorly), the probability of such an event is extremely low. The expected performance, as we have shown, remains $O(n \log n)$ for all possible inputs.

Randomness as a Computational Resource

Randomized algorithms utilize randomness as a resource to guide their computation. In the case of , randomness is employed in the pivot selection process. The number of random bits required in each step to choose a pivot uniformly at random from a subarray of size $m$ is approximately $\lceil \log_2 m \rceil$. Over the entire execution of , the expected total number of random bits used is in the order of $O(n \log n)$.

Randomized QuickSort, on average, utilizes $O(n \log n)$ random bits. In the theoretical analysis of randomized algorithms, it is generally assumed that random bits can be generated and accessed at a constant cost. However, in practical implementations, the generation of random numbers might have a non-negligible cost, which should be considered in performance-critical applications.

The efficiency and robustness of Randomized QuickSort exemplify the power of randomization in algorithm design. By introducing randomness, we achieve an algorithm that is simple to implement, efficient on average, and robust against worst-case input scenarios.

Randomized Min-Cut Algorithm

The Min-Cut Problem

Cut. Given a connected, undirected graph $G = (V, E)$, a cut is a partition of the vertex set $V$ into two non-empty, disjoint sets, say $V_1$ and $V_2 = V \setminus V_1$. The cut-set is the set of edges that have one endpoint in $V_1$ and the other in $V_2$. The size of the cut is the number of edges in the cut-set.

In simpler terms, a cut is a way to divide the graph into two parts, and the cut-set is the collection of edges that go between these parts. The Min-Cut problem seeks to find a cut that minimizes the number of edges crossing between the two partitions.

Min-Cut Problem. The Min-Cut problem is to find a cut in a given connected graph $G$ such that the size of the cut (i.e., the number of edges in the cut-set) is minimized.

Equivalently, we can define a cut $C \subseteq E$ as a subset of edges whose removal increases the number of connected components of the graph. For a connected graph, a cut $C$ is a set of edges such that the graph $(V, E \setminus C)$ is disconnected. The goal is to find such a set $C$ with the minimum possible size $|C|$.

Finding a minimum cut in a graph is a fundamental problem in graph theory and has applications in various fields, including network reliability, network flow, cluster analysis, and image segmentation. Randomized algorithms offer an efficient and elegant approach to solving this problem, often outperforming deterministic methods in terms of simplicity and sometimes speed. Furthermore, randomized approaches can provide a probabilistic guarantee of finding the optimal solution, which is valuable in practice.

Contraction Algorithm

The randomized algorithm we will discuss for finding a Min-Cut is based on a simple yet powerful operation called edge contraction.

Edge Contraction. Contracting an edge $e = (u, v)$ in a graph $G = (V, E)$ produces a new graph $G'$ by merging vertices $u$ and $v$ into a single new vertex, say $uv$. All edges incident to either $u$ or $v$ in $G$ now become incident to the new vertex $uv$ in $G'$. If there are multiple edges between $u$ and $v$, or between either $u$ or $v$ and another vertex $w$, all these edges are preserved in the contracted graph, except for self-loops, which are edges that connect a vertex to itself; self-loops created during contraction are removed.

Essentially, when we contract an edge $(u, v)$, we are treating $u$ and $v$ as the same vertex from that point onwards. The algorithm iteratively contracts randomly chosen edges until only two vertices remain.

$G' \leftarrow G$ Pick an edge $e = (u, v)$ uniformly at random from the edges of $G'$. Contract edge $e$ in $G'$ to get a new graph $G'$. Let $s$ and $t$ be the two vertices remaining in $G'$. Return the cut in the original graph $G$ corresponding to the edges between $s$ and $t$ in $G'$. The estimated min-cut size is the number of edges between $s$ and $t$ in $G'$.

Complexity Analysis: In each iteration of the while loop, the number of vertices decreases by one. If the graph initially has $n$ vertices and $m$ edges, there are $n-2$ contractions. Finding and contracting an edge can be done efficiently. Using adjacency lists, picking a random edge can take $O(m)$ in each step in a naive approach, or more efficiently with some data structure pre-processing. Contracting an edge and updating the graph structure can also be done in time roughly proportional to the degree of the contracted vertices. A single run of the algorithm can be implemented to run in $O(m \log n)$ or even $O(m)$ time with careful implementation.

Illustration of edge contraction. Contracting edge $e = (1,2)$ merges vertices 1 and 2. Note that self-loops are removed, and multi-edges are preserved. The edges originally incident to vertex 1 or 2 are now incident to the merged vertex (1,2).

Figure 1 illustrates the contraction process. Starting with a graph, when we contract an edge, say $(1, 2)$, vertices 1 and 2 are merged into a single vertex, labeled as (1, 2). All edges that were connected to either 1 or 2 are now connected to (1, 2). It is important to note that the contraction process may result in a multigraph, meaning there can be multiple edges between the same pair of vertices. However, self-loops are removed as they do not contribute to the cut size.

It is crucial to observe that the sequence of edge contractions is not independent. The choice of an edge to contract at each step depends on the graph resulting from the previous contractions. This dependency is inherent to the algorithm and needs to be considered in the probability analysis.

Probability of Success

Let $C$ be a minimum cut in the original graph $G$, and let $|C| = k$. We want to analyze the probability that the randomized contraction algorithm finds this min-cut $C$.

Let $E_i$ be the event that in the $i$-th step, we do not contract an edge from the min-cut $C$. The algorithm succeeds if we avoid contracting any edge in $C$ in all steps. We want to calculate $P(\cap_{i=1}^{n-2} E_i) = P(E_1 \cap E_2 \cap \cdots \cap E_{n-2}) = P(E_1) \cdot P(E_2 | E_1) \cdot P(E_3 | E_1 \cap E_2) \cdots P(E_{n-2} | \cap_{j=1}^{n-3} E_j)$.

Let $G_i$ be the graph after $i-1$ contractions. Let $m_i$ be the number of edges in $G_i$. Initially, $G_1 = G$, and let $|V| = n$, $|E| = m$. We are given that the min-cut size is $k$. For any vertex $v$ in $G$, the degree $d(v)$ is the number of edges incident to $v$. The sum of degrees is $2m$.

Lower Bound on Edges in a Graph. In any graph $G$, for any vertex $v$, the degree $d(v)$ is at least the min-cut size $k$. Consequently, the total number of edges $m$ in $G$ is at least $\frac{nk}{2}$, where $n$ is the number of vertices.

Proof. Proof. Consider any vertex $v$. The edges incident to $v$ form a cut that separates $v$ from all other vertices $V \setminus \{v\}$. The size of this cut is the degree of $v$, $d(v)$. Since $k$ is the minimum cut size, we must have $d(v) \geq k$ for every vertex $v$. Summing the degrees of all vertices gives twice the number of edges, i.e., $\sum_{v \in V} d(v) = 2m$. Therefore, $2m = \sum_{v \in V} d(v) \geq \sum_{v \in V} k = nk$, which implies $m \geq \frac{nk}{2}$. ◻

In the first contraction step, in graph $G_1 = G$, the total number of edges is $m_1 = m \geq \frac{nk}{2}$. The number of edges in the min-cut $C$ is $k$. The probability of picking an edge from $C$ in the first step is at most $\frac{k}{m_1} \leq \frac{k}{nk/2} = \frac{2}{n}$. Thus, the probability of not picking an edge from $C$ in the first step is $P(E_1) = 1 - P(\text{pick edge from } C) \geq 1 - \frac{2}{n} = \frac{n-2}{n}$.

Now consider the $i$-th step. Suppose we have successfully avoided contracting any edge from $C$ in the first $i-1$ steps (events $E_1, E_2, \ldots, E_{i-1}$ have occurred). Then, the min-cut size in the contracted graph $G_i$ is still $k$. Let $n_i$ be the number of vertices in $G_i$. After $i-1$ contractions, $n_i = n - (i-1) = n-i+1$. Let $m_i$ be the number of edges in $G_i$. By the same argument as before, $m_i \geq \frac{n_i k}{2} = \frac{(n-i+1)k}{2}$. The number of edges in $C$ in $G_i$ is still $k$ (as we haven’t contracted any edge from $C$). The probability of picking an edge from $C$ in the $i$-th step, given that we have avoided it so far, is $P(\text{pick edge from } C | \cap_{j=1}^{i-1} E_j) \leq \frac{k}{m_i} \leq \frac{k}{(n-i+1)k/2} = \frac{2}{n-i+1}$. Therefore, the probability of not picking an edge from $C$ in the $i$-th step, given that we haven’t picked any in the previous steps, is $P(E_i | \cap_{j=1}^{i-1} E_j) = 1 - P(\text{pick edge from } C | \cap_{j=1}^{i-1} E_j) \geq 1 - \frac{2}{n-i+1} = \frac{n-i-1}{n-i+1}$.

The overall probability of success is: \[P(\cap_{i=1}^{n-2} E_i) = \prod_{i=1}^{n-2} P(E_i | \cap_{j=1}^{i-1} E_j) \geq \prod_{i=1}^{n-2} \left(1 - \frac{2}{n-i+1}\right) = \left(1 - \frac{2}{n}\right) \left(1 - \frac{2}{n-1}\right) \cdots \left(1 - \frac{2}{3}\right)\] \[= \left(\frac{n-2}{n}\right) \left(\frac{n-3}{n-1}\right) \left(\frac{n-4}{n-2}\right) \cdots \left(\frac{3-2}{3}\right) = \frac{(n-2)(n-3)(n-4)\cdots 1}{n(n-1)(n-2)\cdots 3} = \frac{1 \cdot 2}{n(n-1)} = \frac{2}{n(n-1)}\] Thus, the probability that a single run of the randomized contraction algorithm finds a min-cut is at least $\frac{2}{n(n-1)} = O\left(\frac{1}{n^2}\right)$.

To amplify the success probability, we can repeat the algorithm multiple times, say $h$ times, and in each run, we find a cut. We then choose the minimum cut size found across all $h$ runs. Since each run is independent, the probability that none of the $h$ runs finds a min-cut is $\left(1 - \frac{2}{n(n-1)}\right)^h$. Therefore, the probability of finding a min-cut in at least one of the $h$ runs is $1 - \left(1 - \frac{2}{n(n-1)}\right)^h$.

Probability Amplification by Repetition. If we repeat the randomized contraction algorithm $h = \frac{n(n-1)}{2} \ln n$ times, the probability of finding a min-cut is at least $1 - \frac{1}{n}$.

Proof. Proof. Let $h = \frac{n(n-1)}{2} \ln n$. The probability of failure in $h$ independent runs is: \[\left(1 - \frac{2}{n(n-1)}\right)^h = \left(1 - \frac{2}{n(n-1)}\right)^{\frac{n(n-1)}{2} \ln n} \approx \left(e^{-\frac{2}{n(n-1)}}\right)^{\frac{n(n-1)}{2} \ln n} = e^{-\ln n} = \frac{1}{n}\] Thus, the probability of success is at least $1 - \frac{1}{n}$. ◻

There is a trade-off between the computational cost and the probability of error. By increasing the number of repetitions, we can increase the probability of finding a min-cut to be arbitrarily close to 1, but at the cost of increased computation time. The number of repetitions $h = O(n^2 \log n)$ gives a high probability of success. Since each run can be implemented in roughly $O(m)$ or $O(m \log n)$ time, the total expected time complexity to find a min-cut with high probability is polynomial, specifically around $O(n^2 m \log n)$ or $O(n^2 m)$. This is quite efficient for many applications.

Conclusion

In conclusion, randomized algorithms stand out as a vital approach in algorithm design, offering a blend of simplicity, efficiency, and resilience. Throughout this discussion, we have examined two quintessential examples that underscore these advantages: Randomized QuickSort for sorting and the randomized contraction algorithm for finding a Min-Cut in graphs.

Randomized QuickSort elegantly solves the sorting problem with an expected time complexity of $O(n \log n)$, outperforming many deterministic sorting algorithms in average-case scenarios and providing robustness against input ordering. Its randomized pivot selection strategy ensures that no particular input elicits worst-case behavior consistently, making it a practical choice for general-purpose sorting.

The randomized contraction algorithm for Min-Cut provides a strikingly simple yet effective method for tackling a fundamental graph problem. With a probability of success that can be amplified through repetition, this algorithm demonstrates how randomness can be harnessed to find solutions to problems for which deterministic algorithms might be more complex or less efficient. The trade-off between computational cost and success probability allows for tuning the algorithm to meet specific application requirements.

These examples, Randomized QuickSort and the randomized Min-Cut algorithm, are representative of the broader class of randomized algorithms. They highlight the key principles and analysis techniques that are central to the design and understanding of randomized algorithms. The power of randomization lies in its ability to simplify algorithm design, improve average-case performance, and provide probabilistic guarantees, making randomized algorithms indispensable tools in computer science and beyond.

Exercises

Expected Swaps in Randomized QuickSort. Analyze the expected number of swaps performed by Randomized QuickSort when sorting an array of $n$ elements. To do this, define an indicator random variable for each pair of elements that are swapped. Consider when a swap occurs in relation to the pivot selection and partitioning process. Compare the expected number of swaps to the expected number of comparisons. Does minimizing comparisons also minimize swaps?
Implementation and Empirical Analysis of Randomized QuickSort. Implement Randomized QuickSort as described in Algorithm [alg:randqs]. Also, implement standard deterministic QuickSort (e.g., always picking the first element as pivot). Design and conduct experiments to compare the practical performance of both algorithms. Use various input distributions for your experiments:
- Random Arrays: Generate arrays of random numbers.
- Sorted Arrays: Use already sorted arrays as input.
- Reverse Sorted Arrays: Use arrays sorted in descending order.
Measure the running time for different input sizes (e.g., $n = 100, 1000, 10000, \ldots$). Plot the average running time as a function of $n$ for both algorithms and for each input distribution. Analyze and discuss your empirical findings in light of the theoretical analysis of expected and worst-case complexities.
Time Complexity of Contraction Algorithm. Consider a graph with $n$ vertices and $m$ edges, represented using adjacency lists. Analyze in detail the time complexity of one runof one run of the randomized contraction algorithm for Min-Cut (Algorithm [alg:mincut]). Focus on efficient implementation techniques for the following operations:
- Picking a random edge: How to efficiently select an edge uniformly at random from the current set of edges.
- Edge contraction: Describe the steps required to contract an edge $(u, v)$, including merging vertices and updating adjacency lists, while handling multi-edges and removing self-loops.
Determine the overall time complexity of a single run of the algorithm in terms of $n$ and $m$. Provide a detailed step-by-step analysis, considering the data structures and operations involved.
Repetition for Desired Success Probability. We showed that a single run of the randomized contraction algorithm succeeds in finding a min-cut with probability at least $\frac{2}{n(n-1)}$. Determine how many times, $h$, we need to repeat the randomized contraction algorithm to achieve a probability of finding a min-cut of at least $1 - \delta$, where $\delta > 0$ is a given small probability of failure. Calculate the required number of repetitions $h$ as a function of $n$ and $\delta$. For example, if we want a success probability of at least $99\% (\delta = 0.01)$ for a graph with $n=100$ vertices, how many repetitions are needed?
Explore Other Randomized Algorithms. Research and briefly describe other randomized algorithms related to the topics covered in this lecture:
- Randomized Sorting Algorithms: Investigate Randomized MergeSort, or Bucket Sort (when applicable), and compare their expected time complexity and practical advantages/disadvantages to Randomized QuickSort.
- Advanced Min-Cut Algorithms: Explore Karger’s Min-Cut algorithm, which improves upon the basic contraction algorithm to achieve a higher probability of success in fewer repetitions. Compare its approach and complexity to the algorithm discussed in the lecture.
Summarize your findings, highlighting the key ideas and performance characteristics of these alternative randomized algorithms.

--- title: "Randomized Algorithms: QuickSort and Min-Cut" author: "Your Name" date: "2025-02-08" format: html: toc: true # Table of Contents toc-depth: 2 code-tools: true theme: cosmo # Or "journal" for Distill-like minimalism --- # Introduction Randomized algorithms are a class of algorithms that utilize randomness as part of their logic. Unlike deterministic algorithms, which always produce the same output for a given input, randomized algorithms incorporate random choices, typically through the use of random bits, to guide their computation. This introduction to randomized algorithms will explore their fundamental concepts, advantages, and applications, focusing on examples like Randomized QuickSort and the Min-Cut problem. ::: tcolorbox **Definition of Randomized Algorithm.** As defined by Richard M. Karp in his introduction to randomized algorithms, a randomized algorithm is one that receives, in addition to its input data, a stream of random bits that it can use for the purpose of making random choices during its execution. This contrasts with deterministic algorithms, where every step is predetermined by the input. ::: The introduction of randomness into algorithm design offers several compelling advantages: - **Simplicity:** For certain problems, randomized algorithms are surprisingly simpler to design and implement than their deterministic counterparts. The use of randomness can sometimes bypass complex case analysis or intricate data structures required by deterministic approaches. - **Efficiency:** Randomized algorithms often provide significant efficiency gains, particularly in terms of average-case performance. For some problems, the best-known deterministic algorithms are outperformed by randomized algorithms in expectation, and sometimes even in terms of probabilistic worst-case guarantees. - **Robustness:** Randomization can effectively mitigate the risk of encountering worst-case input scenarios that are specifically designed to degrade the performance of deterministic algorithms. By making random choices, the algorithm's performance becomes less dependent on the structure of the input, leading to more consistent behavior across different inputs. It is crucial to understand that due to the incorporation of randomness, repeated executions of a randomized algorithm on the same input may yield different results or follow different computational paths. However, the power of randomized algorithms lies in the fact that they provide guarantees on their performance *in expectation* or *with high probability*. The use of randomness can be viewed as a valuable computational resource. Instead of solely relying on the input data, randomized algorithms strategically leverage randomness to explore a wider range of computational possibilities. This exploration allows them to often find efficient solutions to problems for which deterministic approaches are either too complex or computationally expensive. In essence, randomness becomes a tool to navigate complexity and achieve efficiency in computation. # Randomized QuickSort () ## Algorithm Description QuickSort is a well-known sorting algorithm based on the divide-and-conquer paradigm. Randomized QuickSort () is a randomized variant that aims to achieve consistently good average-case performance, regardless of the input data distribution. The fundamental principle of divide and conquer is maintained, but the crucial step of pivot selection is randomized to avoid worst-case scenarios. ::::: tcolorbox :::: algorithm ::: algorithmic **Choose Pivot:** Select an element $y$ uniformly at random from $S$. This means each element in $S$ has an equal probability of being chosen as the pivot. **Partition:** Compare each element of $S$ with the pivot $y$ to divide $S$ into three sets: - $S_1 = \{x \in S \mid x < y\}$: Elements smaller than $y$. - $S_y = \{x \in S \mid x = y\}$: Elements equal to $y$. - $S_2 = \{x \in S \mid x > y\}$: Elements larger than $y$. **Recursion:** Recursively sort $S_1$ and $S_2$ using . **Output:** Combine the sorted sets as follows: sorted $S_1$, followed by $S_y$, and then sorted $S_2$. ::: **Complexity Analysis:** The expected time complexity of Randomized QuickSort is $O(n \log n)$, where $n$ is the number of elements to be sorted. The worst-case time complexity is $O(n^2)$, but this occurs with very low probability. The space complexity is $O(\log n)$ on average due to the recursive call stack, and $O(n)$ in the worst case. :::: ::::: ::: tcolorbox In step 3 of Algorithm [\[alg:randqs\]](#alg:randqs){reference-type="ref" reference="alg:randqs"}, selecting a pivot uniformly at random from a set $S$ requires a source of randomness. On average, choosing a random element from a set of size $|S|$ can be achieved using approximately $\log_2 |S|$ random bits. ::: ## Analysis of Expected Comparisons To rigorously analyze the efficiency of , we focus on counting the number of comparisons, as this is the dominant operation in sorting algorithms. We aim to determine the expected number of comparisons performed by when sorting a set of $n$ numbers. Let $S = \{z_1, z_2, \ldots, z_n\}$ be the input set, and let $z_{(1)}, z_{(2)}, \ldots, z_{(n)}$ be the elements of $S$ in sorted order. We introduce indicator random variables $X_{ij}$ for each pair of indices $(i, j)$ such that $1 \leq i < j \leq n$. Let $X_{ij}$ be defined as: $$X_{ij} = \begin{cases} 1 & \text{if } z_{(i)} \text{ is compared with } z_{(j)} \text{ during the execution of \text{RandQS}} \\ 0 & \text{otherwise}\end{cases}$$ The total number of comparisons $C$ performed by can be expressed as the sum of these indicator variables over all pairs $(i, j)$ with $1 \leq i < j \leq n$: $$C = \sum_{i=1}^{n} \sum_{j=i+1}^{n} X_{ij}$$ Our goal is to calculate the expected number of comparisons, $\mathbb{E}[C]$. By applying the linearity of expectation, we can write: $$\mathbb{E}[C] = \mathbb{E}\left[ \sum_{i=1}^{n} \sum_{j=i+1}^{n} X_{ij} \right] = \sum_{i=1}^{n} \sum_{j=i+1}^{n} \mathbb{E}[X_{ij}]$$ To compute $\mathbb{E}[C]$, we need to determine $\mathbb{E}[X_{ij}] = \mathbb{P}(z_{(i)} \text{ is compared with } z_{(j)})$. ::: tcolorbox **Condition for Comparison in Randomized QuickSort.** For $1 \leq i < j \leq n$, elements $z_{(i)}$ and $z_{(j)}$ are compared by if and only if one of them is the first pivot chosen from the set $\{z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}\}$ during any recursive call that contains both $z_{(i)}$ and $z_{(j)}$. ::: ::: proof *Proof.* Consider the sorted subsequence $z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}$. If a pivot selected is strictly less than $z_{(i)}$ or strictly greater than $z_{(j)}$, then $z_{(i)}$ and $z_{(j)}$ will be placed in the same subarray for the recursive call. If a pivot is chosen from within the range $[z_{(i)}, z_{(j)}]$, say $z_{(k)}$ where $i \leq k \leq j$, then $z_{(i)}$ and $z_{(j)}$ will be compared to $z_{(k)}$ during the partitioning step. If the pivot is $z_{(i)}$ or $z_{(j)}$, they are directly compared to each other (and all other elements in the current subarray). However, if the first pivot chosen from the set $\{z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}\}$ is some $z_{(k)}$ where $i < k < j$, then $z_{(i)}$ will be placed in the subarray $S_1$ and $z_{(j)}$ in $S_2$ (or vice-versa), and they will never be compared in subsequent recursive calls. Thus, $z_{(i)}$ and $z_{(j)}$ are compared if and only if either $z_{(i)}$ or $z_{(j)}$ is the first pivot chosen from $\{z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}\}$. ◻ ::: Since the pivot is chosen uniformly at random from the current subarray, when considering the set $\{z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}\}$, each element in this set has an equal probability of being the first pivot chosen from this set. There are $j-i+1$ elements in this set. The probability that $z_{(i)}$ is chosen first is $\frac{1}{j-i+1}$, and the probability that $z_{(j)}$ is chosen first is also $\frac{1}{j-i+1}$. These are mutually exclusive events. Therefore, the probability that either $z_{(i)}$ or $z_{(j)}$ is the first pivot chosen from $\{z_{(i)}, z_{(i+1)}, \ldots, z_{(j)}\}$ is the sum of these probabilities: $$\mathbb{P}(X_{ij}) = \mathbb{E}[X_{ij}] = \frac{1}{j-i+1} + \frac{1}{j-i+1} = \frac{2}{j-i+1}$$ Now we can compute the expected number of comparisons: $$\mathbb{E}[C] = \sum_{i=1}^{n} \sum_{j=i+1}^{n} \frac{2}{j-i+1} = 2 \sum_{i=1}^{n} \sum_{j=i+1}^{n} \frac{1}{j-i+1}$$ Let $k = j-i+1$. For a fixed $i$, as $j$ ranges from $i+1$ to $n$, $k$ ranges from $2$ to $n-i+1$. Alternatively, we can change the order of summation. For a fixed value of $k = j-i+1$, where $k$ ranges from $2$ to $n$, the possible values for $i$ are $1, 2, \ldots, n-k+1$. Thus, for each $k$, there are $n-k+1$ pairs $(i, j)$ such that $j-i+1 = k$. $$\mathbb{E}[C] = 2 \sum_{k=2}^{n} \sum_{\substack{1 \leq i < j \leq n \\ j-i+1 = k}} \frac{1}{k} = 2 \sum_{k=2}^{n} (n-k+1) \frac{1}{k} = 2 \sum_{k=2}^{n} \left( \frac{n+1}{k} - 1 \right)$$ $$\mathbb{E}[C] = 2 \left[ (n+1) \sum_{k=2}^{n} \frac{1}{k} - \sum_{k=2}^{n} 1 \right] = 2 \left[ (n+1) \left( \sum_{k=1}^{n} \frac{1}{k} - 1 \right) - (n-1) \right]$$ Recall the $n$-th harmonic number $H_n = \sum_{k=1}^{n} \frac{1}{k}$. Then, $$\mathbb{E}[C] = 2 \left[ (n+1) (H_n - 1) - (n-1) \right] = 2 \left[ (n+1)H_n - (n+1) - n + 1 \right] = 2 \left[ (n+1)H_n - 2n \right] = 2(n+1)H_n - 4n$$ We know that $H_n \approx \ln n + \gamma$, where $\gamma \approx 0.577$ is the Euler-Mascheroni constant. Therefore, $\mathbb{E}[C] \approx 2(n+1)(\ln n + \gamma) - 4n \approx 2n \ln n = O(n \log n)$. ::: tcolorbox **Expected Comparisons of Randomized QuickSort.** The expected number of comparisons performed by Randomized QuickSort on an input of size $n$ is $E[C] = 2(n+1)H_n - 4n \leq 2nH_n$. Thus, the expected time complexity of Randomized QuickSort is $O(n \log n)$. ::: ::: tcolorbox An important feature of this analysis is that it holds for *any* input distribution. The randomization is within the algorithm itself (pivot selection), not in the assumption about the input. This makes Randomized QuickSort a robust algorithm in terms of average-case performance. ::: ## Worst Case vs. Average Case of QuickSort In contrast to Randomized QuickSort, standard deterministic QuickSort, which typically selects the first or last element as a pivot, can suffer from poor performance in the worst case. For instance, if the input array is already sorted or reverse sorted and the first element is always chosen as the pivot, QuickSort degenerates to $O(n^2)$ time complexity. This happens because the partitioning step becomes highly unbalanced. In each step, if the pivot is consistently the smallest or largest element, one of the subarrays is empty, and the other contains $n-1$ elements. This leads to a recurrence relation for the number of comparisons $T(n) = T(n-1) + O(n)$, which solves to $T(n) = O(n^2)$. ::: tcolorbox In the worst-case scenario for deterministic QuickSort, the recursion tree is maximally skewed, leading to $O(n^2)$ time complexity. This occurs for specific input orderings, such as sorted or reverse sorted arrays when a fixed pivot selection strategy (like always picking the first element) is used. ::: Randomized QuickSort mitigates this issue by randomizing the pivot selection. By choosing a pivot uniformly at random, it makes it highly unlikely to consistently encounter the worst-case partitioning in every recursive step, regardless of the input array's initial order. While it is still theoretically possible for Randomized QuickSort to exhibit $O(n^2)$ time complexity (if, by bad luck, pivots are consistently chosen poorly), the probability of such an event is extremely low. The expected performance, as we have shown, remains $O(n \log n)$ for all possible inputs. ## Randomness as a Computational Resource Randomized algorithms utilize randomness as a resource to guide their computation. In the case of , randomness is employed in the pivot selection process. The number of random bits required in each step to choose a pivot uniformly at random from a subarray of size $m$ is approximately $\lceil \log_2 m \rceil$. Over the entire execution of , the expected total number of random bits used is in the order of $O(n \log n)$. ::: tcolorbox Randomized QuickSort, on average, utilizes $O(n \log n)$ random bits. In the theoretical analysis of randomized algorithms, it is generally assumed that random bits can be generated and accessed at a constant cost. However, in practical implementations, the generation of random numbers might have a non-negligible cost, which should be considered in performance-critical applications. ::: The efficiency and robustness of Randomized QuickSort exemplify the power of randomization in algorithm design. By introducing randomness, we achieve an algorithm that is simple to implement, efficient on average, and robust against worst-case input scenarios. # Randomized Min-Cut Algorithm ## The Min-Cut Problem ::: tcolorbox **Cut.** Given a connected, undirected graph $G = (V, E)$, a cut is a partition of the vertex set $V$ into two non-empty, disjoint sets, say $V_1$ and $V_2 = V \setminus V_1$. The cut-set is the set of edges that have one endpoint in $V_1$ and the other in $V_2$. The size of the cut is the number of edges in the cut-set. ::: In simpler terms, a cut is a way to divide the graph into two parts, and the cut-set is the collection of edges that go between these parts. The Min-Cut problem seeks to find a cut that minimizes the number of edges crossing between the two partitions. ::: tcolorbox **Min-Cut Problem.** The Min-Cut problem is to find a cut in a given connected graph $G$ such that the size of the cut (i.e., the number of edges in the cut-set) is minimized. ::: Equivalently, we can define a cut $C \subseteq E$ as a subset of edges whose removal increases the number of connected components of the graph. For a connected graph, a cut $C$ is a set of edges such that the graph $(V, E \setminus C)$ is disconnected. The goal is to find such a set $C$ with the minimum possible size $|C|$. ::: tcolorbox Finding a minimum cut in a graph is a fundamental problem in graph theory and has applications in various fields, including network reliability, network flow, cluster analysis, and image segmentation. Randomized algorithms offer an efficient and elegant approach to solving this problem, often outperforming deterministic methods in terms of simplicity and sometimes speed. Furthermore, randomized approaches can provide a probabilistic guarantee of finding the optimal solution, which is valuable in practice. ::: ## Contraction Algorithm The randomized algorithm we will discuss for finding a Min-Cut is based on a simple yet powerful operation called edge contraction. ::: tcolorbox **Edge Contraction.** Contracting an edge $e = (u, v)$ in a graph $G = (V, E)$ produces a new graph $G'$ by merging vertices $u$ and $v$ into a single new vertex, say $uv$. All edges incident to either $u$ or $v$ in $G$ now become incident to the new vertex $uv$ in $G'$. If there are multiple edges between $u$ and $v$, or between either $u$ or $v$ and another vertex $w$, all these edges are preserved in the contracted graph, except for self-loops, which are edges that connect a vertex to itself; self-loops created during contraction are removed. ::: Essentially, when we contract an edge $(u, v)$, we are treating $u$ and $v$ as the same vertex from that point onwards. The algorithm iteratively contracts randomly chosen edges until only two vertices remain. ::::: tcolorbox :::: algorithm ::: algorithmic $G' \leftarrow G$ Pick an edge $e = (u, v)$ uniformly at random from the edges of $G'$. Contract edge $e$ in $G'$ to get a new graph $G'$. Let $s$ and $t$ be the two vertices remaining in $G'$. Return the cut in the original graph $G$ corresponding to the edges between $s$ and $t$ in $G'$. The estimated min-cut size is the number of edges between $s$ and $t$ in $G'$. ::: **Complexity Analysis:** In each iteration of the while loop, the number of vertices decreases by one. If the graph initially has $n$ vertices and $m$ edges, there are $n-2$ contractions. Finding and contracting an edge can be done efficiently. Using adjacency lists, picking a random edge can take $O(m)$ in each step in a naive approach, or more efficiently with some data structure pre-processing. Contracting an edge and updating the graph structure can also be done in time roughly proportional to the degree of the contracted vertices. A single run of the algorithm can be implemented to run in $O(m \log n)$ or even $O(m)$ time with careful implementation. :::: ::::: ![Illustration of edge contraction. Contracting edge $e = (1,2)$ merges vertices 1 and 2. Note that self-loops are removed, and multi-edges are preserved. The edges originally incident to vertex 1 or 2 are now incident to the merged vertex (1,2).](contraction_example.png){#fig:contraction width="70%"} Figure [1](#fig:contraction){reference-type="ref" reference="fig:contraction"} illustrates the contraction process. Starting with a graph, when we contract an edge, say $(1, 2)$, vertices 1 and 2 are merged into a single vertex, labeled as (1, 2). All edges that were connected to either 1 or 2 are now connected to (1, 2). It is important to note that the contraction process may result in a multigraph, meaning there can be multiple edges between the same pair of vertices. However, self-loops are removed as they do not contribute to the cut size. ::: tcolorbox It is crucial to observe that the sequence of edge contractions is not independent. The choice of an edge to contract at each step depends on the graph resulting from the previous contractions. This dependency is inherent to the algorithm and needs to be considered in the probability analysis. ::: ## Probability of Success Let $C$ be a minimum cut in the original graph $G$, and let $|C| = k$. We want to analyze the probability that the randomized contraction algorithm finds this min-cut $C$. Let $E_i$ be the event that in the $i$-th step, we do not contract an edge from the min-cut $C$. The algorithm succeeds if we avoid contracting any edge in $C$ in all steps. We want to calculate $P(\cap_{i=1}^{n-2} E_i) = P(E_1 \cap E_2 \cap \cdots \cap E_{n-2}) = P(E_1) \cdot P(E_2 | E_1) \cdot P(E_3 | E_1 \cap E_2) \cdots P(E_{n-2} | \cap_{j=1}^{n-3} E_j)$. Let $G_i$ be the graph after $i-1$ contractions. Let $m_i$ be the number of edges in $G_i$. Initially, $G_1 = G$, and let $|V| = n$, $|E| = m$. We are given that the min-cut size is $k$. For any vertex $v$ in $G$, the degree $d(v)$ is the number of edges incident to $v$. The sum of degrees is $2m$. ::: tcolorbox **Lower Bound on Edges in a Graph.** In any graph $G$, for any vertex $v$, the degree $d(v)$ is at least the min-cut size $k$. Consequently, the total number of edges $m$ in $G$ is at least $\frac{nk}{2}$, where $n$ is the number of vertices. ::: ::: proof *Proof.* Consider any vertex $v$. The edges incident to $v$ form a cut that separates $v$ from all other vertices $V \setminus \{v\}$. The size of this cut is the degree of $v$, $d(v)$. Since $k$ is the minimum cut size, we must have $d(v) \geq k$ for every vertex $v$. Summing the degrees of all vertices gives twice the number of edges, i.e., $\sum_{v \in V} d(v) = 2m$. Therefore, $2m = \sum_{v \in V} d(v) \geq \sum_{v \in V} k = nk$, which implies $m \geq \frac{nk}{2}$. ◻ ::: In the first contraction step, in graph $G_1 = G$, the total number of edges is $m_1 = m \geq \frac{nk}{2}$. The number of edges in the min-cut $C$ is $k$. The probability of picking an edge from $C$ in the first step is at most $\frac{k}{m_1} \leq \frac{k}{nk/2} = \frac{2}{n}$. Thus, the probability of not picking an edge from $C$ in the first step is $P(E_1) = 1 - P(\text{pick edge from } C) \geq 1 - \frac{2}{n} = \frac{n-2}{n}$. Now consider the $i$-th step. Suppose we have successfully avoided contracting any edge from $C$ in the first $i-1$ steps (events $E_1, E_2, \ldots, E_{i-1}$ have occurred). Then, the min-cut size in the contracted graph $G_i$ is still $k$. Let $n_i$ be the number of vertices in $G_i$. After $i-1$ contractions, $n_i = n - (i-1) = n-i+1$. Let $m_i$ be the number of edges in $G_i$. By the same argument as before, $m_i \geq \frac{n_i k}{2} = \frac{(n-i+1)k}{2}$. The number of edges in $C$ in $G_i$ is still $k$ (as we haven't contracted any edge from $C$). The probability of picking an edge from $C$ in the $i$-th step, given that we have avoided it so far, is $P(\text{pick edge from } C | \cap_{j=1}^{i-1} E_j) \leq \frac{k}{m_i} \leq \frac{k}{(n-i+1)k/2} = \frac{2}{n-i+1}$. Therefore, the probability of not picking an edge from $C$ in the $i$-th step, given that we haven't picked any in the previous steps, is $P(E_i | \cap_{j=1}^{i-1} E_j) = 1 - P(\text{pick edge from } C | \cap_{j=1}^{i-1} E_j) \geq 1 - \frac{2}{n-i+1} = \frac{n-i-1}{n-i+1}$. The overall probability of success is: $$P(\cap_{i=1}^{n-2} E_i) = \prod_{i=1}^{n-2} P(E_i | \cap_{j=1}^{i-1} E_j) \geq \prod_{i=1}^{n-2} \left(1 - \frac{2}{n-i+1}\right) = \left(1 - \frac{2}{n}\right) \left(1 - \frac{2}{n-1}\right) \cdots \left(1 - \frac{2}{3}\right)$$ $$= \left(\frac{n-2}{n}\right) \left(\frac{n-3}{n-1}\right) \left(\frac{n-4}{n-2}\right) \cdots \left(\frac{3-2}{3}\right) = \frac{(n-2)(n-3)(n-4)\cdots 1}{n(n-1)(n-2)\cdots 3} = \frac{1 \cdot 2}{n(n-1)} = \frac{2}{n(n-1)}$$ Thus, the probability that a single run of the randomized contraction algorithm finds a min-cut is at least $\frac{2}{n(n-1)} = O\left(\frac{1}{n^2}\right)$. To amplify the success probability, we can repeat the algorithm multiple times, say $h$ times, and in each run, we find a cut. We then choose the minimum cut size found across all $h$ runs. Since each run is independent, the probability that none of the $h$ runs finds a min-cut is $\left(1 - \frac{2}{n(n-1)}\right)^h$. Therefore, the probability of finding a min-cut in at least one of the $h$ runs is $1 - \left(1 - \frac{2}{n(n-1)}\right)^h$. ::: tcolorbox **Probability Amplification by Repetition.** If we repeat the randomized contraction algorithm $h = \frac{n(n-1)}{2} \ln n$ times, the probability of finding a min-cut is at least $1 - \frac{1}{n}$. ::: ::: proof *Proof.* Let $h = \frac{n(n-1)}{2} \ln n$. The probability of failure in $h$ independent runs is: $$\left(1 - \frac{2}{n(n-1)}\right)^h = \left(1 - \frac{2}{n(n-1)}\right)^{\frac{n(n-1)}{2} \ln n} \approx \left(e^{-\frac{2}{n(n-1)}}\right)^{\frac{n(n-1)}{2} \ln n} = e^{-\ln n} = \frac{1}{n}$$ Thus, the probability of success is at least $1 - \frac{1}{n}$. ◻ ::: ::: tcolorbox There is a trade-off between the computational cost and the probability of error. By increasing the number of repetitions, we can increase the probability of finding a min-cut to be arbitrarily close to 1, but at the cost of increased computation time. The number of repetitions $h = O(n^2 \log n)$ gives a high probability of success. Since each run can be implemented in roughly $O(m)$ or $O(m \log n)$ time, the total expected time complexity to find a min-cut with high probability is polynomial, specifically around $O(n^2 m \log n)$ or $O(n^2 m)$. This is quite efficient for many applications. ::: # Conclusion In conclusion, randomized algorithms stand out as a vital approach in algorithm design, offering a blend of simplicity, efficiency, and resilience. Throughout this discussion, we have examined two quintessential examples that underscore these advantages: Randomized QuickSort for sorting and the randomized contraction algorithm for finding a Min-Cut in graphs. Randomized QuickSort elegantly solves the sorting problem with an expected time complexity of $O(n \log n)$, outperforming many deterministic sorting algorithms in average-case scenarios and providing robustness against input ordering. Its randomized pivot selection strategy ensures that no particular input elicits worst-case behavior consistently, making it a practical choice for general-purpose sorting. The randomized contraction algorithm for Min-Cut provides a strikingly simple yet effective method for tackling a fundamental graph problem. With a probability of success that can be amplified through repetition, this algorithm demonstrates how randomness can be harnessed to find solutions to problems for which deterministic algorithms might be more complex or less efficient. The trade-off between computational cost and success probability allows for tuning the algorithm to meet specific application requirements. These examples, Randomized QuickSort and the randomized Min-Cut algorithm, are representative of the broader class of randomized algorithms. They highlight the key principles and analysis techniques that are central to the design and understanding of randomized algorithms. The power of randomization lies in its ability to simplify algorithm design, improve average-case performance, and provide probabilistic guarantees, making randomized algorithms indispensable tools in computer science and beyond. # Exercises 1. **Expected Swaps in Randomized QuickSort.** Analyze the expected number of swaps performed by Randomized QuickSort when sorting an array of $n$ elements. To do this, define an indicator random variable for each pair of elements that are swapped. Consider when a swap occurs in relation to the pivot selection and partitioning process. Compare the expected number of swaps to the expected number of comparisons. Does minimizing comparisons also minimize swaps? 2. **Implementation and Empirical Analysis of Randomized QuickSort.** Implement Randomized QuickSort as described in Algorithm [\[alg:randqs\]](#alg:randqs){reference-type="ref" reference="alg:randqs"}. Also, implement standard deterministic QuickSort (e.g., always picking the first element as pivot). Design and conduct experiments to compare the practical performance of both algorithms. Use various input distributions for your experiments: - **Random Arrays:** Generate arrays of random numbers. - **Sorted Arrays:** Use already sorted arrays as input. - **Reverse Sorted Arrays:** Use arrays sorted in descending order. Measure the running time for different input sizes (e.g., $n = 100, 1000, 10000, \ldots$). Plot the average running time as a function of $n$ for both algorithms and for each input distribution. Analyze and discuss your empirical findings in light of the theoretical analysis of expected and worst-case complexities. 3. **Time Complexity of Contraction Algorithm.** Consider a graph with $n$ vertices and $m$ edges, represented using adjacency lists. Analyze in detail the time complexity of one runof one run of the randomized contraction algorithm for Min-Cut (Algorithm [\[alg:mincut\]](#alg:mincut){reference-type="ref" reference="alg:mincut"}). Focus on efficient implementation techniques for the following operations: - **Picking a random edge:** How to efficiently select an edge uniformly at random from the current set of edges. - **Edge contraction:** Describe the steps required to contract an edge $(u, v)$, including merging vertices and updating adjacency lists, while handling multi-edges and removing self-loops. Determine the overall time complexity of a single run of the algorithm in terms of $n$ and $m$. Provide a detailed step-by-step analysis, considering the data structures and operations involved. 4. **Repetition for Desired Success Probability.** We showed that a single run of the randomized contraction algorithm succeeds in finding a min-cut with probability at least $\frac{2}{n(n-1)}$. Determine how many times, $h$, we need to repeat the randomized contraction algorithm to achieve a probability of finding a min-cut of at least $1 - \delta$, where $\delta > 0$ is a given small probability of failure. Calculate the required number of repetitions $h$ as a function of $n$ and $\delta$. For example, if we want a success probability of at least $99\% (\delta = 0.01)$ for a graph with $n=100$ vertices, how many repetitions are needed? 5. **Explore Other Randomized Algorithms.** Research and briefly describe other randomized algorithms related to the topics covered in this lecture: - **Randomized Sorting Algorithms:** Investigate Randomized MergeSort, or Bucket Sort (when applicable), and compare their expected time complexity and practical advantages/disadvantages to Randomized QuickSort. - **Advanced Min-Cut Algorithms:** Explore Karger's Min-Cut algorithm, which improves upon the basic contraction algorithm to achieve a higher probability of success in fewer repetitions. Compare its approach and complexity to the algorithm discussed in the lecture. Summarize your findings, highlighting the key ideas and performance characteristics of these alternative randomized algorithms.