Introduction to distributed scheduling:-
Scheduling refers to the execution of non-interactive processes or tasks at designated times and places around a network of computer
Distributed scheduling refers to the chaining of different jobs into a coordinated   workflow that spans several computers. For example: - you schedule a processing job on computer1 and computer2, and when these are finished you need to schedule a job on computer3, this is distributed scheduling.

Locally distributed system consists of a collection of autonomous computers, connected by a LAN.
In a locally distributed system, there is a good possibility that several computers are heavily loaded while others are ideal or lightly loaded.
If we can move jobs around, the overall performance of the system can be maximized.
A distributed scheduler is a resources management component of a distributed operating system that focuses on judiciously and transparently redistributing the load of the system among the computers to maximize the overall performance.

A distributed system may have a mix of heavily and lightly loaded system.

  • Tasks arrive at random intervals.
  •  CPUs service time requirements are also random.
  • Migrating a task to share or balance load can help.
System may be heterogeneous in terms of CPU speed and resources. The distributed system may be heterogeneous in terms of load in different system.
Even in homogeneous distributed system a system may be idle even when a task is waiting for service in other system.
Consider a system of N identical, independent M/M/1 servers. Let P be the probability that the system is in state in which at least 1 task is waiting for service and at least 1 server is idle.
Let P be the utilization of each servers. We can estimate P using probabilistic analysis and plot a graph against system utilization.
For moderate system utilization, Value of P is high, i.e. at least 1 node is idle. Hence, performance can be improved by sharing of task.

What is performance? Average response time.


Load on a system/node can correspond to the queue length of tasks/ processes that need to be processed.

Queue length of waiting tasks: proportional to task response time, hence a good indicator of system loads.

Distributing load: transfer tasks/processes among nodes.

If a task transfer (from another node) takes a long time, the node may accept more tasks during the transfer time.

Causes the node to be highly loaded. Affects performance.

Solution: artificially increment the queue length when a task is accepted for transfer from remote node (to account for the proposed increased in load).

Task transfer can fail? : use timeouts.

Types of Algorithms

Static load distribution algorithms: Decisions are hard-coded into an algorithm with a priori knowledge of system.

Dynamic load distribution: use system state information such as task queue length, processor utilization.

Adaptive load distribution: adapt the approach based on system state.(e.g.) Dynamic distribution algorithms collect load information from nodes. Even at very high system loads.

Load information collection itself can add load on the system as messages need to be exchanged.

Adaptive distribution algorithms may stop collecting state information at high loads.

Balancing vs. sharing

Load balancing: Equalize load on the participating nodes.

Transfer tasks even if a node is not heavily loaded so that queue length on all

Nodes are approximately equal.

More number of tasks transfers, might degrade performance.

Load sharing: Reduce burden of an overloaded node.

Transfer tasks only when the queue length exceeds a certain threshold.

Less number of task transfers.

Anticipatory task transfer: Transfer from overloaded nodes to ones that are likely to become idle/highly loaded

More like load balancing, but may be less number of transfers.

Related topics

Professor Jayesh video tutorial

Please use contact page in this website if you find anything incorrect or you want to share more information about the topic discussed above.