Shortest Paths in DAGs and Scheduling Problems
Introduction
This lecture focuses on shortest path problems in Directed Acyclic Graphs (DAGs) and their application to scheduling problems. We will begin by briefly reviewing Dijkstra’s algorithm and Breadth-First Search (BFS) from the previous lecture, setting the stage for understanding more efficient algorithms tailored for DAGs. The core of this lecture will cover topological sorting, a fundamental concept for DAGs, and a dynamic programming approach to find shortest paths in DAGs. Finally, we will explore how these concepts are applied to solve scheduling problems, including the Critical Path Method (CPM) for project management.
Review of Previous Lecture
In the previous lecture, we discussed algorithms for finding shortest paths in graphs.
Dijkstra’s Algorithm
Dijkstra’s algorithm is a classic algorithm used to find the shortest paths from a single source node to all other nodes in a graph, provided all edge weights are non-negative.
Time Complexity Analysis
The time complexity of Dijkstra’s algorithm depends on the implementation.
Heap-based Implementation: Using a binary heap to prioritize nodes, Dijkstra’s algorithm can achieve a time complexity of \(O(M \log N)\), where \(N\) is the number of nodes and \(M\) is the number of edges in the graph. This implementation is efficient for sparse graphs.
Array-based Implementation: In a simpler array-based implementation, where we linearly search for the node with the smallest distance at each step, the time complexity is \(O(N^2)\). This can be more efficient for dense graphs.
Shortest Paths with Unit Edge Weights
When all edge weights are equal to one (unit edge weights), we can use a more specialized and efficient algorithm.
Breadth-First Search (BFS)
Breadth-First Search (BFS) is ideally suited for finding shortest paths in graphs with unit edge weights. Starting from the source node, BFS explores the graph layer by layer, guaranteeing that the first time a node is reached, it is through a shortest path in terms of the number of edges. BFS uses a queue data structure to manage the nodes to be visited, achieving a time complexity of \(O(N + M)\).
Shortest Paths in Directed Acyclic Graphs (DAGs)
This section focuses on finding shortest paths in Directed Acyclic Graphs (DAGs).
Significance of DAGs
DAGs are significant because they represent dependencies and precedence relationships without cycles. This property is crucial in various scenarios, such as task scheduling, dependency resolution in software projects, and project planning.
Absence of Negative Cycles
A key characteristic of DAGs is that they inherently cannot contain cycles, and therefore, they certainly cannot contain negative cycles. This is important because the presence of negative cycles can make the shortest path problem ill-defined or computationally harder in general graphs. In DAGs, we are guaranteed to find shortest paths efficiently.
Topological Sorting
Topological sorting is a linear ordering of vertices such that for every directed edge from vertex \(u\) to vertex \(v\), vertex \(u\) comes before vertex \(v\) in the ordering. Topological sorting is possible if and only if the graph is a DAG.
Definition and Properties
A topological sort of a DAG \(G = (V, E)\) is a linear ordering of its vertices such that for every directed edge \((u, v) \in E\), \(u\) comes before \(v\) in the ordering.
A key property of DAGs is the existence of at least one source node (a node with no incoming edges). Topological sorting algorithms typically start from source nodes.
Algorithm for Topological Sort
\(in\_degree \leftarrow\) Compute in-degree for each vertex \(v \in V\) \(Q \leftarrow\) Queue of all vertices with \(in\_degree = 0\) \(L \leftarrow\) Empty list to store sorted vertices \(u \leftarrow\) Dequeue from \(Q\) Append \(u\) to \(L\) \(in\_degree[v] \leftarrow in\_degree[v] - 1\) Enqueue \(v\) into \(Q\) \(L\) "Graph has a cycle"
Illustrative Example
Consider the following DAG:
Applying topological sort, one possible ordering is: 1, 2, 4, 3, 6, 5, 7, 8. After relabeling the nodes according to this topological order, we get:
In this topological ordering, every edge \((u, v)\) goes from a smaller index \(u\) to a larger index \(v\).
Non-Uniqueness of Topological Order
It is important to note that topological sorting is not unique. If there are multiple source nodes at any step, the order in which they are processed can lead to different valid topological sorts.
Shortest Path Algorithm for DAGs
We can efficiently find shortest paths in DAGs using dynamic programming, leveraging the topological order of the vertices.
The ‘len’ Array
We use an array, denoted as ‘len’, where ‘len[i]’ will store the length of the shortest path from a source node \(S\) to node \(i\). We assume, without loss of generality, that \(S\) is the first node in the topological sort.
Initialization of ‘len’
Initialize the ‘len’ array as follows:
\(\len[S] = 0\), where \(S\) is the source node.
\(\len[i] = \infty\) for all other nodes \(i \neq S\). (Alternatively, we can calculate values progressively without explicit initialization to infinity for nodes we haven’t reached yet in the topological order).
If we are interested in paths from a specific source \(S\) to a target \(T\), and the topological sort indices range from 1 to \(N\), we are concerned with the portion of the ‘len’ array from index \(S\) to \(T\). Nodes with indices less than \(S\) are unreachable from \(S\) in a topologically sorted DAG, and nodes with indices greater than \(T\) are not relevant if we are only interested in paths up to \(T\).
Dynamic Programming Approach
We process the nodes in topological order. For each node \(i\) from \(S+1\) to \(T\), we calculate \(\len[i]\) using the following recurrence relation:
\[\len[i] = \min_{(j, i) \in E} \{ \len[j] + c(j, i) \}\] where \(c(j, i)\) is the cost of the edge from node \(j\) to node \(i\). This formula considers all incoming edges to node \(i\). Since we are processing nodes in topological order, when we calculate \(\len[i]\), the values \(\len[j]\) for all predecessors \(j\) of \(i\) are already computed.
Time Complexity Analysis
For each node \(i\), we iterate through its incoming edges. The total time complexity is the sum of the in-degrees of all nodes from \(S\) to \(T\). In the worst case, if we consider all nodes in the DAG, the total time complexity is proportional to the sum of all in-degrees, which is equal to the number of edges \(M\). Therefore, the time complexity of this algorithm is \(O(N + M)\), which is linear in the size of the graph, assuming we include the time to initialize and iterate through nodes.
Detailed Example
Let’s apply the shortest path algorithm to the DAG example we saw earlier, aiming to find the shortest path from node 1 to node 8.
Step-by-Step Calculation
We initialize \(\len[1] = 0\) and consider nodes in the topological order we found earlier: 1, 2, 4, 3, 6, 5, 7, 8.
Node 1: \(\len[1] = 0\).
Node 2: Predecessor is 1. \(\len[2] = \len[1] + c(1, 2) = 0 + 3 = 3\). Parent of 2 is 1.
Node 4: Predecessor is 2. \(\len[4] = \len[2] + c(2,4)\). Assuming edge (2,4) exists in the original graph, though not shown in the diagram. Let’s assume c(2,4) = 10 for this example. Then \(\len[4] = 3 + 10 = 13\). Parent of 4 is 2.
Node 3: Predecessor is 2. \(\len[3] = \len[2] + c(2, 3) = 3 + 4 = 7\). Parent of 3 is 2.
Node 6: Predecessors are 2 and 3.
From 2: \(\len[2] + c(2, 6) = 3 + 8 = 11\).
From 3: \(\len[3] + c(3, 6) = 7 + 5 = 12\).
\(\len[6] = \min(11, 12) = 11\). Parent of 6 is 2.
Node 5: Predecessors are 4 and 3.
From 4: \(\len[4] + c(4, 5) = 13 + 4 = 17\).
From 3: \(\len[3] + c(3, 5) = 7 + 8 = 15\).
\(\len[5] = \min(17, 15) = 15\). Parent of 5 is 3.
Node 7: Predecessors are 6, 5 and 4.
From 6: \(\len[6] + c(6, 7) = 11 + 10 = 21\).
From 5: \(\len[5] + c(5, 7) = 15 + 3 = 18\).
From 4: \(\len[4] + c(4,7) = 13 + 10 = 23\).
\(\len[7] = \min(21, 18, 23) = 18\). Parent of 7 is 5.
Node 8: Predecessors are 7 and 1.
From 7: \(\len[7] + c(7, 8) = 18 + 1 = 19\).
From 1: \(\len[1] + c(1, 8) = 0 + 20 = 20\).
\(\len[8] = \min(19, 20) = 19\). Parent of 8 is 7.
Thus, the shortest path length from node 1 to node 8 is 19.
Reconstructing the Shortest Path
To reconstruct the shortest path, we backtrack from the destination node (8) to the source node (1) using the parent pointers we recorded during the calculation.
Start at node 8. Parent is 7.
From node 7. Parent is 5.
From node 5. Parent is 3.
From node 3. Parent is 2.
From node 2. Parent is 1.
We reached the source node 1.
Reversing this sequence, the shortest path is \(1 \rightarrow 2 \rightarrow 3 \rightarrow 5 \rightarrow 7 \rightarrow 8\) with a total length of 19.
Application: Scheduling Problems
Directed Acyclic Graphs have significant applications in scheduling problems.
Introduction to Scheduling
Scheduling problems involve managing jobs and resources to optimize certain objectives, often under constraints.
Jobs and Resources
In scheduling, we have a set of jobs or tasks that need to be processed using available resources, such as machines, processors, or personnel.
Precedence Constraints
Precedence constraints define the order in which jobs must be executed. For example, job A must be completed before job B can start. These constraints are naturally represented by directed edges in a graph, where an edge from A to B indicates that A must precede B.
Modeling Scheduling with DAGs
DAGs are a natural and effective way to model scheduling problems with precedence constraints.
Nodes Representing Jobs/Tasks
In a DAG scheduling model, each node in the graph represents a job or task that needs to be performed.
Edges Representing Precedence Relations
A directed edge from node \(i\) to node \(j\) signifies that job \(i\) must be completed before job \(j\) can start.
Job Processing Times
Each job \(j\) is associated with a processing time \(P_j\), which is the duration required to complete the job.
Scheduling Constraints
Besides precedence constraints, scheduling problems may involve other types of constraints.
Precedence Constraints
As discussed, these are the core constraints in many scheduling problems, dictating the order of job execution.
Release Dates
A release date \(r_j\) for job \(j\) specifies the earliest time at which job \(j\) can start its processing. Job \(j\) cannot start before time \(r_j\).
Deadlines
A deadline \(d_j\) for job \(j\) specifies the latest time by which job \(j\) must be completed. Job \(j\) must finish by time \(d_j\).
Critical Path Method (CPM)
The Critical Path Method (CPM) is a project management technique used to schedule, organize, and coordinate tasks within a project. It is particularly useful for projects with well-defined tasks and precedence constraints.
Adding Fictitious Start and End Nodes
In CPM, it is often helpful to introduce fictitious start and end nodes. A start node (node 0) represents the beginning of the project, and an end node (node \(n+1\)) represents the completion of the project. These nodes have zero processing time. Edges are added from the start node to all jobs that have no predecessors, and from all jobs that have no successors to the end node.
Earliest Starting Time (ET)
The Earliest Starting Time (\(\ET[i]\)) for a job \(i\) is the earliest possible time at which job \(i\) can begin, considering all precedence constraints.
Calculation as a Longest Path Problem
In CPM, to find the earliest starting time, we treat the problem as a longest path problem in the DAG. The \(\ET[i]\) for each job \(i\) is the length of the longest path from the start node (0) to node \(i\). We can use a similar dynamic programming approach as for shortest paths, but instead of minimizing, we maximize:
\[\ET[i] = \max_{(j, i) \in E} \{ \ET[j] + P_j \}\] with \(\ET[0] = 0\). The edge weights in this longest path calculation are the processing times of the predecessor jobs.
Makespan Definition
The Makespan (\(\makespan\)) is the total time required to complete all jobs in the project. It is the completion time of the last job to finish. In CPM, the makespan is given by the earliest starting time of the fictitious end node, \(\makespan = \ET[n+1]\).
Latest Starting Time (LT)
The Latest Starting Time (\(\LT[i]\)) for a job \(i\) is the latest possible time at which job \(i\) can start without delaying the project completion time (makespan).
Calculation of LT
To calculate \(\LT[i]\), we work backward from the end node (\(n+1\)). We set \(\LT[n+1] = \makespan\). Then, for each job \(i\), we calculate \(\LT[i]\) as:
\[\LT[i] = \min_{(i, j) \in E} \{ \LT[j] - P_i \}\] This calculation ensures that if job \(i\) starts at time \(\LT[i]\), all subsequent jobs can still be completed without exceeding the makespan.
Slack Time
The Slack Time (or float) for a job \(i\) is the amount of time that job \(i\) can be delayed without delaying the project makespan. It is calculated as:
\[\text{Slack}[i] = \LT[i] - \ET[i]\]
Identifying Critical Jobs and Paths
A critical job is a job with zero slack time (\(\text{Slack}[i] = 0\)). These jobs must start on time to avoid delaying the project. A critical path is a path from the start node to the end node consisting only of critical jobs. The critical path determines the minimum project completion time (makespan).
Illustrative Example: Document Preparation
Consider the example of preparing a document, such as a thesis.
Task Decomposition
Tasks involved in document preparation might include:
Write Section A
Write Section B
Print Section A
Print Section B
Bind A + B
Buy Envelope
Write Address
Package
Deliver to Secretary
Mail Package
Precedence Graph Construction
We can construct a precedence graph based on dependencies. For example, "Print Section A" must come after "Write Section A". "Bind A + B" must come after both "Print Section A" and "Print Section B". We also need an edge from "Bind A+B" to "Package".
Application of CPM
By assigning processing times to each task and applying CPM, we can determine:
The earliest start time for each task.
The latest start time for each task without delaying the completion.
The critical path, which identifies the sequence of tasks that directly impacts the project completion time.
The makespan, which is the minimum time to complete the document preparation.
This analysis helps in identifying critical tasks that need close monitoring and resource allocation to ensure timely project completion.
Conclusion
In this lecture, we explored efficient algorithms for finding shortest paths in Directed Acyclic Graphs (DAGs) using topological sorting and dynamic programming. We saw that DAGs, due to their acyclic nature, simplify the shortest path problem and allow for linear-time solutions. We then applied these concepts to scheduling problems, introducing the Critical Path Method (CPM). CPM helps in project management by identifying critical tasks and paths, enabling efficient resource allocation and project planning.
Key takeaways from this lecture include:
DAGs are a special class of graphs that simplify shortest path calculations.
Topological sorting is a fundamental algorithm for DAGs, enabling efficient processing.
Dynamic programming provides a linear-time algorithm for shortest paths in DAGs.
CPM is a powerful application of DAGs and longest path algorithms for project scheduling and critical task identification.
Further questions to consider:
How can we handle more complex scheduling constraints, such as resource limitations or job priorities?
How do these methods extend to project management scenarios with uncertainties in task durations?
What are other applications of shortest and longest path algorithms in DAGs beyond scheduling?
This concludes our lecture on shortest paths in DAGs and scheduling problems.