Distributed algorithms present theoretical computer science from a different perspective.

Let’s start with an example: Imagine there is a graph with a large set of nodes - The nodes could represent anything. Maybe a network of computers exchanging information. Maybe a group of cells in a multicellular organism which exchange biochemical information continuously.

Given this graph, we might need to compute some information, let’s say we need to find a matching in this graph, or maybe we need to find a coloring of the graph that uses least number of colors (Classic graph theory problems).

Coloring problems though why? How is it useful in real-life situations? Let’s take the network of computers example. A coloring here would give a schedule to exchange information. Choose an ordering over the set of colors . Then iterate over it. At time , all the computers which are marked with color can be made active. Such a set of computers of the same color would form an independent set. Since no two nodes are neighbor in such an independent set, therefore the active computers can safely do whatever they want without disrupting the nearby computers (since they are inactive).

A general overview of computation problems that we are interested would be - Given a graph, the nodes need to work together (in the sense, exchange information) to solve some computation problem. Usually it is a graph problem - Finding a spanning tree, finding a proper vertex coloring, etc.

Distributed Problems

When we talk about distributed problems, there are few assumptions. Initially, each node is only aware of itself (For example, the color of the node itself). The final goal is such that each node knows its own part of the solution for a given computation problem (In the case of coloring, the final color the node should have). This is already enough for each node to know what to do. For example, if the node knows that its final color is , then time it should be active is time slot .

This is the key difference between sequential (classical) computing and distributed computing. In classical computer science problems, we assume that the whole input in stored in one place (For example, the complete graph), we have to process it and produce a whole output (the complete color mapping for each node in the graph).

In distributed computing, each node has no information about the whole graph, except the node itself. It knows about its neighbors and it can exchange messages only with those neighbors to explore the graph. This could be done for every node and this process could be done iteratively until every node of the graph has the complete information about the graph, but this might not be feasible in terms of communication. By communication, we mean the exchange of information between the computers. Each exchange of information could be viewed as a communication round, where each node exchanges messages with its own neighbors once. We want to minimize the number of communication rounds.

Indeed, there is an assumption here that there is some ulterior superpower/super machine that knows the whole input (i.e., the global structure of the graph), but we don’t pay much attention to this while talking about distributed problems. Also, distributed problems are differently from parallel computing problems. Parallel computing problems also have the full input/output stored at one place, except the computation is parallelized.

In other areas of computer science, we have a bird’s eye view of what the computation problem’s input looks like, while in distributed algorithms, we only have an insider’s view. We should be able to produce our own output (irrespective of other outputs) based on the information exchanged in the surroundings and so does every node should. This introduces us to the concept of locality. Fast distributed algorithms are necessarily highly localized.

By “fast” , here we talk about the number of communication rounds as well along with the classical time complexity and space complexity (computation rounds). Just how time and space are viewed as a resource, communication could also be viewed as a resource in distributed problems. This is because in general networks of computers, communication step (exchanging information) becomes the bottleneck over the computation steps. To give a perspective on how costly a communication is, getting one bit from another computer in the same local network approximately takes the same time as computing one billion arithmetic operations inside a normal computer.

Distributed algorithms heavily rely on locality. Before designing an algorithm, one should be aware of which graph problems are local and which graph problems are global. Local problems can be solved so that each node only looks at its neighbors, while global problems need the whole input graph to be available for the computation. This would also answer the question of which problems could be done using smaller number of communication rounds and which problems necessarily require larger number of rounds.

Such answers will be useful in understanding nature and fundamental limitations of any other system consisting of interacting entities such as social networks, job markets etc.

Definitions and Terminologies:

Before we move on to the Algorithms, there are few terminologies and assumptions that should be made clear about the distributed network model we consider.

  1. The network model is a connected undirected graph - It could be weighted or unweighted

  2. Local communication - Nodes can communicated directly(only) with their neighbors through the edges. There are two types of communication.

    • Local unicast - Nodes can send different messages to each of its neighbors (more suitable to wired networks).
    • Local broadcast - Nodes send the same message to all of its neighbors in any step (feature in wireless networks).
  3. Synchrony - Two important models can be distinguished based on processor synchronization.

    • Synchronous model - Each processor has an internal clock and the clocks are synchronized. We assume the processor speeds are uniform and each processor takes the same amount of time to perform the same operation. Computation proceeds in lock-step in a series of discrete rounds (time steps). In each round, each processor (node) can do some local (internal) computation and can also send/receive messages. To be concrete, we assume that at the beginning of a round, a node receives messages (if any) from its neighbors via its incident edges.

    • Asynchronous model - No assumptions are made about the internal clocks. We assume that messages arrive in the same order they are sent (FIFO). Algorithms designed for the synchronous model could be transferred to asynchronous model by means of a tool called the synchronizer.

  4. Local knowledge - Two models

    • KT0 - (Knowledge of all nodes is restricted Till radius 0) also known as clean network model. Standard model that is typically used. In this model, each node has a port associated with an incident edge (each having a port number). Each node only knows about its own port number and that an edge goes out of it, but nothing about the other endpoint of the edge.

    • KT1 - Here one can assume that the nodes have initial knowledge of their neighbors, especially their IDs.

  5. CONGEST vs LOCAL

    • CONGEST - The size of each message sent per round is small, typically of size , where is the size of the network. This is a reasonable bound as this is at least required to send the unique address of a node. This model captures the inherent bandwidth restriction that is present in real-world networks.

    • LOCAL - There is no restriction on the size of message. This model is useful in focusing on locality issues in distributed computing.

  6. Operation - Usually, each node is assumed to operate on the same instance of the algorithm. However, depending on the local information, each node can have its own behavior (due to randomness or unique ID or the information sent by other nodes).

Resources: