Kosaraju's Algorithm

Finding strongly connected components.

  1. Do DFS starting at random node to create a DFS forest (i.e. disconnected DFS trees), keeping track of finishing times
  2. Reverse all edges in the graph
  3. Do DFS again to create another DFS forest, start DFS procedure with the vertices with the largest finishing time in step 1
  4. strongly connected components are the trees that changed from the first in the second DFS graph

main idea

Recall that finishing time helps us track topological sort. After reversing edges, the condensation graph node with the largest finish time has no edges going out (i.e. it was originally the first in the topological sort); if we run DFS there, we will find exactly that component. After DFSing there, we will have “removed” that component (having visited it), and repeat.

proof

condensation graph is a DAG

see condensation graph.

if A -> B in condensation graph, A will finish later in DFS than B

Consider two condensation graph components, \(A, B\) such that \(A \to B\). A’s finish time will be greater (later) than B’s finish time in DFS.

case 1: we reached \(A\) before \(B\) in our DFS. Consider \(y \in B\) being the node being the largest finish time in \(B\), \(z \in A\) being the first node discovered. This means that we reached \(A\) before \(B\), meaning we will discover \(y\) via going through \(z\). (\(y\) is a descendant of \(z\) in the DFS forest). This mean that the order of DFS would be start z -> start y -> finish y -> finish z. Since B’s finish time is y’s finish time, and A’s finish time is greater than or equal to z’s finish time; this gives A.finish >= y.finish.

case 2: we reached \(B\) before \(A\) in our DFS. This means there are no paths from \(B\) to \(A\) (otherwise they will be strongly connected). This also means \(A\)’s finish time is greater than \(B\)’s finish time as per given since they are separate DFS.

In the reversed condensation graph, if \(B\to A\), then A’s finish time will be greater (later) than B’s finish time in DFS.

The converse of the above is also true: if A’s finish time will be greater (later) than B’s finish time in DFS, \(B\to A\)

tying it together

Because of the corollary above, the first component we start from will be the top of the topological sort. In the reverse DFS, all of the edges will lead IN, meaning since the component is a strongly connected components you have no way to leave.

By induction.

Hypothesis: the first \(t\) DFS trees found in the reversed DFS forest are the t SCCs with the largest finishing times.

Base case \(t=0\), well the first 0 DFS trees are the 0 SCCs with the largest finish times.

Inductive step:

  • consider the \(\qty(t+1)\) tree produced; suppose its root is \(x\).
  • Suppose \(x\) lives in the SCC \(A\). We know that \(A\)’s finish time is greater than \(B\)’s finish time for all remaining SCC (since we sorted by finish time, and \(A\)’s finish time will be \(x\)’s finish time since its the largest finish time).
  • Hence, the lemma above gives us that there’s no edges leaving \(A\) to the remaining reverse DAG (since edges were reversesd, and \(A\) is a connected component); that is, \(B \to A\) in the reversed graph. Then the DFS started at \(x\) correctly recovers only \(A\). So \(\qty(t+1)\)

runtime

You ran DFS twice \(O\qty(n+m)\) and reversed edges \(O\qty(m)\): so \(O\qty(n+m)\).