Job Scheduling in a Distributed System Using Backfilling with Inaccurate Runtime Computations

Categories: ComputersJob

In this study, we look at a grid system where service is required for both parallel and sequential jobs. Backfilling is used, but a margin of error is added to the prediction of runtime of a job. The impact on system performance will be analyzed and the outcomes will be compared with the known optimum runtime case. Two different scheduling methods are considered and the system is evaluated by using a simulation model 14224-26422


Job scheduling using FCFS algorithm is a challenging task when resources are not available for large grid jobs (gangs) consist of number of parallel tasks.

One of the promising solutions of gang scheduling is Backfilling.

Backfilling initiates small jobs before large jobs that require currently unavailable resources. The estimated service times of all jobs must be known during job submission or predictions made by the system based on historic data. However neither of the methods is accurate.

This work overcomes such inaccuracies in a way that some jobs causing delays will backfill before queued gang can begin processing.

Get quality help now
Sweet V
Verified writer

Proficient in: Computers

4.9 (984)

“ Ok, let me say I’m extremely satisfy with the result while it was a last minute thing. I really enjoy the effort put in. ”

+84 relevant experts are online
Hire writer

The system under study consists of two sites including both parallel and simple jobs. Local jobs (simple jobs) will arrive directly at local scheduler whereas grid jobs will first dispatched to a site by grid scheduler then further local scheduler will allocate them to a processor.

This work extends the study in [1] where power of 2 gang sizes were considered.


There are two sites in the simulation model consisting of a local scheduler with 16 processors.

Get to Know The Price Estimate For Your Paper
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

The workload consists of local and grid jobs & three arrival streams, one at grid jobs and one inside the local jobs of each both sites. This work extends [1] where gang size was power of 2 (2, 4, 8, 16) and follow a uniform distribution with values ranging from 2 to 13. Both [1] and [2] results will be compared. The gang sizes were chosen in this study such that average number of tasks per gang in both studies is equal to 7.5.

The mean inter arrival time of gangs and locals is exponentially 1/?1,1/?2,1/?3 for locals in site 1, site 2 and gangs respectively where ?1, ?2, ?3 are arrival rates for locals in site 1 and site 2 and gangs respectively. We assume that ?1=?2=? and ?3<<?.The service time of a local job or a gang?s task is also exponentially distributed with a mean of 1/µ.

Communication between two sites and GS(gang jobs) relies on message passing therefore its communication time is negligible. However when a grid job is assigned on both sites extra coordination is needed. Figure 1 shows the configuration of this model.

Figure 1. The queuing network model


A. Grid level

1) Allocation policies: The GS will determine whether the arrived gang will be dispatched to a new site or stored in a waiting queue. Two approaches are used for this purpose.

  • Approach 1: In this approach first each site is examined for enough idle processors. If this condition is met then GS picks one of the sites and start execution immediately. If no site satisfy this condition then gang is scheduled to the site that has enough empty queues & will start processing when queues become idle. If there is no enough empty queues in both sites then gang waits in GS’s waiting queue.
  • Approach 2:In this approach the gang tasks can be dispatched to both sites.If approach 1 conditions are not met, then the idle processors of both sites are computed. The gang will be routed to both sites if there are enough idle processors, otherwise it is moved to waiting queue if number of processors in both sites are less than number of gang tasks.

2) Queuing disciplines:

  • It involves two steps.
  • The tasks of each gang in GS queue are computed if they are less or equal to number of empty queues in any site.
  • The largest gang is chosen among all & the oldest one is chosen if there are more then one largest gangs waiting for the execution.-111753016
  • The GS checks for a gang in its queue with a lower or equal number of tasks to the number of idle processors in both sites.The gang found is routed to both sites for processing.If more than one gang is found then approach 1 strategy is used.

B. Local level 1) Allocation policies:

  • The algorithm has two steps.
  • The local job will be scheduled to shortest queue if all processors are busy.
  • The local job will be scheduled to that processor where execution start immediately if there are queues with idle processors and if the following condition is met: ServiceTime ? ElapsedTime + T 2) Queuing disciplines: 3885184681736

If the processor is idle and queue is empty when local job enters a queue,the job execution will start immediately. In FCFS gangs will delay local jobs services that is why backfilling is used in which gang is waiting for service so enough processors are available for local jobs.The local job can start execution before a gang waiting in the queue if following condition is met:

ServiceTime ? ElapsedTime + T

Where service time is run time of local job and elapsed time is the remaining time until gang start execution.


A. Performance metrics

To evaluate system performance we compute two different response times: the average response time of the gangs and the average response time of the local jobs. The metric used to measure the response time of a job j if m is the number of total processed job is given as:

The response time rj of a job j is the time interval between job arrival and completion .The metric used to

measure delay of a job against its actual runtime is shown below3441192-401571

Where sj is slowdown of a job and ej is service time.

Average slowdown is given as:

The response time and slowdown of parallel jobs is defined by below metrics. Whenever it comes to parallel jobs each gang need to be weighted with its size.

The average weighted response time WRT:

The average weighted slowdown WSLD:

C. Simulation experiments-22853036

Figure2 have a higher completion of gangs for uniform

gang sizes and in both approaches; Approach 1 and 2, all gangs complete execution for medium and high workload.

In Figure3 approach 2 produce much better results for gang size of powers of two as compared to uniform gang sizes in figure 2. The result for Approach1 in both Figure 2 and 3 are almost same. Approach 2 results in the completion of 10% more gang than approach1.

Figure 4 and 5 shows that when workload is high approach 2 results in lower WRT and WSLD. The reason behind is that approach 2 minimizes waiting time as gangs tasks can be routed to both sites for execution oppose to approach1.

Figure 6 and 7 shows that for higher workloads, RT and SLD for local jobs are bit higher for approach 2 compared to Figure 4. WRT versus ? for uniform gang sizes Approach1 while gang sizes are uniformed.

Cite this page

Job Scheduling in a Distributed System Using Backfilling with Inaccurate Runtime Computations. (2019, Nov 28). Retrieved from

Job Scheduling in a Distributed System Using Backfilling with Inaccurate Runtime Computations

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment