Array Jobs

Usually you don’t just want to run one job on the cluster, but multiple jobs of the same type with different inputs. Say a simulation that you want to run 1,000 times with different starting parameter(s).

The naive thing to do is to write a script that does a qsub many times. The reason for this is that each invocation creates a separate job, which has it’s own job id, has it’s own job priority, have it’s own entry in qstat, and be considered independently of all other jobs by the scheduler. For a couple of hundred jobs this is mostly an annoyance, as qstat becomes harder to read, it’s harder to manage all the jobs (have to use more complicated shell commands to kill all the jobs for example), but for thousands of jobs it starts to impact the performance of the scheduler.

Also, the jobs further down the list will have a much reduced priority as they will be diminished the more jobs you submit, meaning that perhaps the first few hundred jobs are ok, but perhaps if the cluster if busy it will take longer for the jobs at the end to run because their priority will be relatively lower than other user’s jobs.

The correct way to submit multiple jobs to Grid engine is to use an array job. An array job is designed for exactly this scenario - you can have a single job script but then have multiple sub-jobs (called tasks).

Creating a simple Array Job

In the same example as cited in the Batch Jobs page, is another job script called fibonacci_array.job.

# Copy over the cluster job_templates directory if you haven't
$ cp -r /cm/shared/examples/job_templates $HOME/
$ cd $HOME/job_templates/batch_serial
$ cat fibonacci_array.job
######################################################################
# Options for the batch system

... *SNIP*

module load python
# **************************
# Here we have a new environment variable that is only set for array jobs - $SGE_TASK_ID
# This is the value of the task ID for each array job, so if we asked for an
# array job with 10 tasks, then SGE_TASK_ID will range from 1 to 10.
# We are using this here to compute the fibonacci sequence of length == the task ID
# **************************
./fibonacci.py $SGE_TASK_ID
echo "Finished job script"

This is otherwise just like the simple batch job example - the fibonacci.py script takes a single integer argument as the number of digits of the sequence to print.

The key difference with an array job is the new environment variable highlighted, $SGE_TASK_ID. An array job gets a single JOB ID, and then all the tasks get an additional TASK ID to differentiate them. $SGE_TASK_ID contains the value of this ID – if you request 100 jobs, then the TASK IDs will vary from 1 to 100.

Let’s see how this is used,

$ qsub -t 1-10 -q trial.q fibonacci_array.job
Your job-array 3985186.1-10:1 ("fibonacci_array.job") has been submitted

$ qstat -u $USER
job-ID     prior   name       user         state submit/start at     queue                          jclass                         slots ja-task-ID
------------------------------------------------------------------------------------------------------------------------------------------------
   3985234 0.00000 fibonacci_ mb325        qw    10/21/2014 15:04:16                                                                   1 1-10:1

# Waiting a few moments for the jobs to start running

$ qstat -u $USER
job-ID     prior   name       user         state submit/start at     queue                          jclass                         slots ja-task-ID
------------------------------------------------------------------------------------------------------------------------------------------------
   3985231 0.50000 fibonacci_ mb325        r     10/21/2014 15:04:04 trial.q@node011.cm.cluster                                        1 1
   3985231 0.50000 fibonacci_ mb325        r     10/21/2014 15:04:04 trial.q@node011.cm.cluster                                        1 2
   3985231 0.50000 fibonacci_ mb325        r     10/21/2014 15:04:04 trial.q@node011.cm.cluster                                        1 3
   3985231 0.50000 fibonacci_ mb325        r     10/21/2014 15:04:04 trial.q@node011.cm.cluster                                        1 4
   3985231 0.50000 fibonacci_ mb325        r     10/21/2014 15:04:04 trial.q@node011.cm.cluster                                        1 5
   3985231 0.50000 fibonacci_ mb325        r     10/21/2014 15:04:04 trial.q@node011.cm.cluster                                        1 6
   3985231 0.50000 fibonacci_ mb325        r     10/21/2014 15:04:04 trial.q@node011.cm.cluster                                        1 7
   3985231 0.50000 fibonacci_ mb325        r     10/21/2014 15:04:04 trial.q@node011.cm.cluster                                        1 8
   3985231 0.50000 fibonacci_ mb325        r     10/21/2014 15:04:04 trial.q@node011.cm.cluster                                        1 9
   3985231 0.50000 fibonacci_ mb325        r     10/21/2014 15:04:04 trial.q@node011.cm.cluster                                        1 10

So you request an array task by passing the -t parameter to qsub, and then specifying the range of task IDs. You can also specify a step-size to this range of IDs, which works like so,

$ qsub -t 1-10:2 -q trial.q fibonacci_array.job
$ qstat -u mb325
job-ID     prior   name       user         state submit/start at     queue                          jclass                         slots ja-task-ID
------------------------------------------------------------------------------------------------------------------------------------------------
   3985327 0.50000 fibonacci_ mb325        qw    10/21/2014 15:12:00                                                                   1 1-9:2

# and then...
job-ID     prior   name       user         state submit/start at     queue                          jclass                         slots ja-task-ID
------------------------------------------------------------------------------------------------------------------------------------------------
   3985324 0.50000 fibonacci_ mb325        r     10/21/2014 15:11:59 trial.q@node011.cm.cluster                                        1 1
   3985324 0.50000 fibonacci_ mb325        r     10/21/2014 15:11:59 trial.q@node011.cm.cluster                                        1 3
   3985324 0.50000 fibonacci_ mb325        r     10/21/2014 15:11:59 trial.q@node011.cm.cluster                                        1 5
   3985324 0.50000 fibonacci_ mb325        r     10/21/2014 15:11:59 trial.q@node011.cm.cluster                                        1 7
   3985324 0.50000 fibonacci_ mb325        r     10/21/2014 15:11:59 trial.q@node011.cm.cluster                                        1 9

So in this case, the step-size was 2, and the starting TASK ID was 1 and max TASK ID was 10, therefore we only have 5 task with IDs 1,3,5,7,9. Note that the additional environment variables $SGE_TASK_FIRST, $SGE_TASK_LAST, $SGE_TASK_STEPSIZE are available to use in your job scripts.

Note

To get the opposite to the above range, where instead we get all the even numbers up to 10, you would do:

qsub -t 2-10:2 -q trial.q fibonacci_array.job

Note also that in the qstat output, all the tasks have the same job-ID on the left, but their task-IDs are visible down the right. You will also get separate job stdour/stderr log files for each task that have their task-ID as the suffix.

Example: Array Task with varying Input Files

The above example is very simplistic and contrived as you rarely will pass the $TASK_ID directly as a parameter. A more common example might be that you have various input files to a program and each task should open a different input file for processing and the processes an output file for itself. Of course this is probably still simplistic, but hopefully serve as a slightly more realistic example of how to use array tasks than the above. The important point is that you can get creative here! Anything is possible, and if you are unsure how to adapt your program to use job arrays ask us!

The following example exists in the job_templates directory under batch_array.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
$ cd $HOME/job_templates/batch_array
$ ls
input.1   input.2  input.4  input.6  input.8  reverse.job
input.10  input.3  input.5  input.7  input.9  reverse.py

$ cat reverse.job
######################################################################
# Options for the batch system

... *SNIP*

# Here we have a new environment variable that is only set for array jobs - $SGE_TASK_ID
# This is the value of the task ID for each array job, so if we asked for an
# array job with 10 tasks, then SGE_TASK_ID will range from 1 to 10
#
# we are using this here to compute the fibonacci sequence of length == the task ID
./reverse.py input.$SGE_TASK_ID
echo "Finished job script"

$ cat reverse.py
#!/usr/bin/python
# Open an input file, reverse the order of the lines and write to an output file
import sys
import os

try:
  taskid = os.environ['SGE_TASK_ID']
  input_filename = sys.argv[1]
  output_filename = "output.{0}".format(taskid)

  with open(output_filename,'w') as output_file:
    for line in reversed(open(input_filename).readlines()):
      output_file.write(line)

except KeyError:
  print "Error: could not read SGE_TASK_ID from environment"
  exit


$ qsub -t 1-10 -q trial.q reverse.job

This example has 10 input files that have some contents, and a simple python script which accepts a filename as an argument, reverses the ordering of the lines in the file and then writes it to an output. The above shows how the $SGE_TASK_ID variable is used in both the job script and the python program to determine the input and the output files.

Note

Another way this could be achieved is if all the input files were listed in another text file, called index.txt say, in which each line of the file contains an input filename, and the line number should correspond to a task. This way the job script can just read this index file, and then for each $SGE_TASK_ID it reads that line of the index file as the input file. This way you wouldn’t have to have your input files specially numbered to match up with the task-ID.

Example: Array Task reading Inputs from a File

Another way of accomplishing the example above that is perhaps even more flexible and useful, is to specify the inputs for each array task through a special index file.

The example below shows how you can create a simple text file, here called index, that contains an input filename on each line. The array job script then reads a single line of the file for each array task, so array task 1 will take it’s input from the 1st line of the file, array task 2 will read the 2nd line, and so on...

This example script is in the job_templates directory, also under batch_array like the above example. The array job file is called reverse_from_index.job and a snippet of this is shown below,

# Obtain the job_templates folder like shown above
$ cd batch_array
$ cat index
input.8
input.4
special_input.6
input.9
input.5
other_input.9
other_input.1

# So index simply lists a number of input files, but obviously in this
# example we aren't limiting ourselves to special names for the input
# files, they can be anything. The ordering in the index is all that
# matters.

$ cat reverse_from_index.job
######################################################################
# Options for the batch system

... *SNIP* ...

######################################################################
# Job Script
# Here we are writing in bash (as we set bash as our shell above). In here you
# should set up the environment for your program, copy around any data that
# needs to be copied, and then execute the program
######################################################################
# Here we execute usual shell commands like any other shell script. The
# output here will be sent to the job's standard out
echo "Running job script"

# ****
# Here we have a new environment variable that is only set for array
# jobs - $SGE_TASK_ID. This is the value of the task ID for each array
# job, so if we asked for an array job with 10 tasks, then SGE_TASK_ID
# will range from 1 to 10

# We are using the SGE_TASK_ID variable to read a particular line from
# the index file

INPUT_FILE=$(sed "$SGE_TASK_ID"'q;d' index)
./reverse.py $INPUT_FILE
sleep 10
echo "Finished job script"

Querying / Managing Array Jobs

You can use all the standard job control utilities qdel, qstat etc on array jobs - the advantage is that you can either operate on the whole array job or just particular tasks. So if you need to kill all 10000 tasks you can simply do it by using qdel JOB_ID.

To kill specific array tasks then you can do,

# Kill array tasks 2 to 10
qdel JOB_ID -t 4-10