A connected subgraph of G that is not a proper subgraph of any other connected subgraph of G is a component of G, i.e. there’s a uv path in the mentioned subgraph

Strongly connected components are useful in a variety of graph algorithms, including finding the shortest path between two vertices, detecting cycles in a graph, and determining the structure of a graph. They can be computed efficiently using algorithms such as Tarjan’s algorithm and Kosaraju’s algorithm.

Undirected graphs

The problem of finding components in an undirected graph requires a simple graph traversal starting from an arbitrary vertex keeping track of the vertices that were already visited, it’s also needed to run the algorithm above for every vertex of G (given that it was not visited)

  • the number of components of an undirected graph G is equal to the number of disconnected subgraphs
vector<bool> visited;
// adjacency list of G
vector<vector<int> > g;

void dfs(int v) {
  visited[v] = true;
  for (int i = 0; i < g[v].size(); i += 1) {
    int next = g[v][i];
    if (!visited[next]) {
      dfs(next);
    }
  }
}

/**
 * Computes the number of connected components in an undirected graph `G`
 * of order `n` and size `m`
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 *
 * @return {int} The number of components in `G`
 */
int connected_components() {
  int n = g.size();
  visited.assign(n, false);

  int components = 0;
  for (int i = 0; i < visited.size(); i += 1) {
    if (!visited[i]) {
      dfs(i);
      ++components;
    }
  }
  return components;
}

Directed graphs

Given a directed graph G two nodes u,vV(G) are called strongly connected if v is reachable from u and u is reachable from v

A strongly connected component (SCC) of G is a subgraph CV(G) such that

  • C is not empty
  • for any u,vV(G), u and v are strongly connected
  • for any uV(G) and vGC, u and v are not strongly connected

Tarjan’s algorithm

The idea is to perform a DFS from an arbitrary vertex (conducting subsequent DFS from non-explored vertices), during the traversal each vertex v is assigned with two numbers:

  • the time it was explored denoted as vtime
  • the smallest index of any node known to be reachable from v denoted as vlow

Let u be a node that belongs to a SCC, if v is the arbitrary vertex chosen then the only known vertex that is reachable from u is u, let v be a vertex discovered during the exploration of u, if there’s a vu path then it means that there’s a cycle and all the vertices in the path uv belong to the same connected component, such a node u is called the root of the SCC

Let u be a node that belongs to a SCC, if it’s known that there’s a uv cycle and also that u can reach a vertex t with lower index than u then v and t belong to the same component

A stack is also needed to keep track of the nodes that were visited, the working of the stack follows the invariant: a node remains on the stack after exploration if and only if it has a path to some node earlier in the stack

// adjacency list of G
vector<vector<int> > g;

int time_spent;
// the number of scc
int total_scc;

// the time a vertex was discovered
vector<int> time_in;
// the smallest index of any vertex known to be reachable from `i`
vector<int> back;
// the scc vertex `i` belongs to
vector<int> scc;
// invariant: a node remains in the stack after exploration if
// it has a path to some node explored earlier that is in the stack
vector<bool> in_stack;
stack<int> vertices;

void dfs(int v) {
  int next;

  // the lowest back edge discovery time of `v` is
  // set to the discovery time of `v` initally
  back[v] = time_in[v] = ++time_spent;

  vertices.push(v);
  in_stack[v] = true;

  for (int i = 0; i < g[v].size(); i += 1) {
    next = g[v][i];
    if (time_in[next] == -1) {
      // unvisited edge
      dfs(next);
      // propagation of the lowest back edge discovery time
      back[v] = min(back[v], back[next]);
    } else if (in_stack[next]) {
      // (v, next) is a back edge only if it's connected to a predecessor
      // of `v`, i.e. if `next` is in same branch in the dfs tree
      //
      // an alternative is to use the time a vertex finished exploring its
      // adjacent nodes, if the time is not set then it's a back edge
      back[v] = min(back[v], time_in[next]);
    }
  }

  // if the root node of a connected component has finished
  // exploring all its neighbors, assign the same component `id`
  // to all the elements in the scc
  if (back[v] == time_in[v]) {
    total_scc += 1;
    do {
      next = vertices.top();
      vertices.pop();
      in_stack[next] = false;
      scc[next] = total_scc;
    } while (next != v);
  }
}

/**
 * Finds the strongly connected components in a digraph `G` of order `n`
 * and size `m`
 *
 * Time complexity: O(n + m)
 * Space complexity: O(n)
 *
 * @returns {int} the number of strongly connected components
 */
int tarjan() {
  int n = g.size();

  scc.assign(n, -1);
  time_in.assign(n, -1);
  back.assign(n, -1);
  in_stack.assign(n, false);
  while (!vertices.empty()) {
    vertices.pop();
  }

  time_spent = 0;
  total_scc = 0;

  for (int i = 0; i < n; i += 1) {
    if (time_in[i] == -1) {
      dfs(i);
    }
  }
  return total_scc;
}