Batch Jobs

Batch System

As of Sept 21st 2014, we are running Univa Grid Engine 8.2.0

Batch jobs are the opposite of interactive jobs on the cluster. Essentially you are providing a script to the cluster that contains all the necessary environment and instructions to run your job and then the cluster batch system goes off and finds free compute resources to run this on in the background. Job output gets written to log files that you can inspect at any time to see how your job is progressing. This way jobs can run for hours, days or even weeks, allowing you to log off and do other things while you wait for the results.

There are two types of batch jobs,

  • Serial : These use only one core for the process.
  • Parallel : These can be of various types, but can utilise multiple cores on a single machine, or even multiple cores on multiple machines.

The management of the jobs is controlled by Univa Grid Engine.

Basics of Batch submission

The core command you use to submit a job to the batch system is qsub. If you don’t have this command in your $PATH then you need to run module load sge to get it – and you should add that to your ~/.bashrc so you have it on every login.

qsub expects as it’s primary argument a job script, which is a special shell script that can contain both arguments to qsub, and also all the necessary steps to setup the environment for your job end execute it.

To try out submitting a job, make a copy of the cluster’s job templates folder into your home directory and enter the batch_serial directory,

$ cp -r /cm/shared/examples/job_templates $HOME/
$ cd $HOME/job_templates/batch_serial
$ ls -1
fibonacci_array.job
fibonacci.job
fibonacci.py

This folder contains 3 files: a python script, fibonacci.py that accepts a single argument which is the number of digits of the fibonacci sequence to generate, a job script for this script called fibonacci.job, and a third job script fibonacci_array.job which will be discussed in Array Jobs.

The contents of fibonacci.job is,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
######################################################################
# Options for the batch system
# These options are not executed by the script, but are instead read by the
# batch system before submitting the job. Each option is preceeded by '#$' to
# signify that it is for grid engine.
#
# All of these options are the same as flags you can pass to qsub on the
# command line and can be **overriden** on the command line. see man qsub for
# all the details
######################################################################
# -- The shell used to interpret this script
#$ -S /bin/bash
# -- Execute this job from the current working directory.
#$ -cwd
# -- Job output to stderr will be merged into standard out. Remove this line if
# -- you want to have separate stderr and stdout log files
#$ -j y
#$ -o output/
# -- Send email when the job exits, is aborted or suspended
# #$ -m eas
# #$ -M YOUR_USERNAME@sussex.ac.uk

######################################################################
# Job Script
# Here we are writing in bash (as we set bash as our shell above). In here you
# should set up the environment for your program, copy around any data that
# needs to be copied, and then execute the program
######################################################################
# We can pass arguments to this script like normal. Here we read in $1 if it is
# available, otherwise, set the default value, ARG_DIGITS=10
ARG_DIGITS="${1:-10}"
# Here we execute usual shell commands like any other shell script. The
# output here will be sent to the job's standard out
echo "Running job script"
# We can set up an environment variable that can be seen by the script
export MY_ENV_VAR="This variable is exported into the process's environment"
# We need to ensure we set up the complete environment needed by our job,
# in this case, just loading the python module
module load python
# Finally we run our executable. Here we are passing the command line argument
# above to the script
#/cm/shared/examples/job_templates/batch_serial/fibonacci.py $ARG_DIGITS
./fibonacci.py $ARG_DIGITS
echo "Finished job script"

The first section of the file Options for the batch system contains a few lines similar to: #$ -S /bin/bash. This is a command to the batch system and has a special prefix #$, which tells UGE to interpret it as an option. All the standard options available to qsub can be used here. The above contains some of the most basic ones.

Note

If you want your jobs to send you email when they finish, or start, or are aborted, then uncomment the following lines in the above example script:

# -- Send email when the job exits, is aborted or suspended
#$ -m eas
#$ -M YOUR_USERNAME@sussex.ac.uk

And change YOUR_USERNAME to your ITS username. The line #$ -m eas specifies when to send email. e means on job exit, s means on suspension, a means when aborted.

Take a look in man qsub for more options that can be specified or ask the support team if you want to know how to do something.

To run this script simply do the following,

$ qsub fibonacci.job
Your job 3978336 ("fibonacci.job") has been submitted

After submitting the job you will get output like shown above which contains your job id, so above it’s 3978336.

At this stage your job has joined the batch system queue – all jobs wait in the queue until the batch system scheduler has found the jobs free resources on the cluster based on what they are requesting.

Important

One of the most important options to qsub is -q queue1.q queue2.q ..., which specifies what batch system queues you want your job to run in. As shown above, you can specify multiple queues for your job, and the scheduler will consider all of them when trying to run your job in the cluster, so particularly when the cluster is busy, it makes sense to try to maximise your chance of running by specifying all queues that you could happily run in. Be careful though if your job requires specific environments or hardware to work however as the queues address different sub-sets of hardware. See queues for the details.

You can see the queue by running another command qstat,

$ qstat
job-ID   prior   name       user           state submit/start at     queue                          jclass                         slots ja-task-ID
------------------------------------------------------------------------------------------------------------------------------------------------
 3812057 0.50224 BB_Na      user343        r     09/22/2014 12:38:59 parallel.q@node017.cm.cluster                                     4
 3812058 0.50013 QLOGIN     user149        r     09/22/2014 12:39:25 mps.q@node120.cm.cluster                                          1
 3812059 0.50357 BB_Na      user343        r     09/22/2014 12:40:10 parallel.q@node001.cm.cluster                                     6
 3812062 0.50013 QLOGIN     user789        r     09/22/2014 12:43:39 mps.q@node201.cm.cluster                                          1
...
 3960079 0.50023 job_777003 user429        qw    10/20/2014 11:37:10                                                                   1
 3959845 0.50023 job_777000 user429        qw    10/20/2014 11:28:28                                                                   1
 3959846 0.50023 job_777001 user429        qw    10/20/2014 11:28:28                                                                   1
 3959967 0.50019 job_777004 user429        qw    10/20/2014 11:32:46                                                                   1
 3959851 0.50005 job_777006 user429        qw    10/20/2014 11:28:28                                                                   1
 3886523 0.50002 odin.sh    user714        qw    10/13/2014 11:32:26                                                                   1 2-20:1

Running qstat like this shows all running and queued jobs together so the output can be long, hence I’ve snipped it in half.

If you just want to see the queued jobs, or just the running jobs run,

# All pending jobs
$ qstat -s p
# All running jobs
$ qstat -s r

To break down the above output a bit by column:

  • job-ID: job ID
  • prior : job priority. This is how queued jobs are ordered, with jobs having a higher priority being attempted to be run by the scheduler first.
  • name : job name. This defaults to the name of your job script but can be specified to qsub with the option: -N your_custom_name
  • user : user name
  • state : the state that the job is in. See Viewing your Job’s Evolution for details.
  • submit/start at : the submission or start time and date of the job
  • queue : only for running jobs, the queue name and machine name where the job is running.
  • jclass : what job class the job uses
  • slots : how many slots the job requested/uses
  • ja-task-ID : the job array task IDs. See Array Jobs for more details on this.

If you just want to see your jobs run,

# Just your jobs
$ qstat -u $USER

See man qstat for more details.

Tip

Sometimes you want to watch what happens as your jobs progress quickly through the various states (queued -> running -> completed). You can achieve this by doing:

watch -d "qstat -u $USER"

And you will get rolling update of the status every 2 seconds. CTRL-C to exit.

Viewing your Job’s Evolution

There are a number of states that a job goes through to completion (or not). The rough lifecycle is the following:

queued/waiting (qw) ---> transferring (t) ---> running (r) ---> complete
                    |                     |
                    |                     |
                    ---> error (Eqw)      ---> suspended (s)

the brackets represent how the states appear in qstat output.

Whe the job is submitted it will be in the queued/waiting state. If there is an error when the batch system tries to start the job then the job will be in the Eqw state. To figure out what went wrong take a look at the qstat output for the job,

$ qstat -j 3960079
==============================================================
job_number:                 3960079
jclass:                     NONE
exec_file:                  job_scripts/3960079
submission_time:            10/20/2014 11:37:10.138
owner:                      user429
...
error reason    1:          10/20/2014 11:37:10 [3799:24236]: can't stat() "/mnt/lustre/admin/output.log" as stdout_path: Permission denied KRB5CCNAME=none uid=3799 gid=3799 3799 55760 60166 60176 60317 60497 60526 161009 1048576
scheduling info:            (Collecting of scheduler job information is turned off)

the field error reason will show what the error was which you can either interpret yourself, or will be helpful to provide to the support team.

The state transferring is just as the job script and any data that needs staging on the compute nodes is copied over. running is self explanatory.

When a job is suspended then the job is sent the SIGSTOP signal and is suspended on the node where it is executing. It will stay that way until it is either resumed, or it is killed. This can happen for a number of reasons. An administrator may suspend a job if it is causing problems on a node for other jobs. Or a job may be suspended if it is running in a subordinate queue, and other higher-priority work needs to run in that queue.

Jobs that have completed, or are aborted, do not appear in qstat. To see the final details of the job including running/cpu time and other resources used, you can use the qacct tool to see the jobs accounting logs,

$ qacct -j 3978336
==============================================================
qname        serial.q
hostname     node052.cm.cluster
group        mb325_g
owner        mb325
project      NONE
department   defaultdepartment
jobname      fibonacci.job
jobnumber    3978336
taskid       undefined
account      sge
priority     0
cwd          /home/m/mb/mb325/projects/hpc/job_templates/batch_serial
submit_host  feynman.cm.cluster
submit_cmd   qsub fibonacci.job
qsub_time    10/21/2014 11:38:31.275
start_time   10/21/2014 11:38:31.562
end_time     10/21/2014 11:38:31.687
granted_pe   NONE
slots        1
failed       0
deleted_by   NONE
exit_status  0
ru_wallclock 0.125
ru_utime     0.037
ru_stime     0.037
ru_maxrss    5336
ru_ixrss     0
ru_ismrss    0
ru_idrss     0
ru_isrss     0
ru_minflt    9080
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   192
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     207
ru_nivcsw    14
cpu          0.074
mem          0.000
io           0.000
iow          0.000
maxvmem      0.000
maxrss       0.000
maxpss       0.000
arid         undefined
jc_name      NONE

Note the ru_wallclock here reports the wall time (in seconds) your job ran for – useful if you want to have an idea of how long your job is taking. There are also stats here for memory, cpu and other resources used, which can be really handy for profiling what your job needs.

Killing your Job

You can kill your job at any point in it’s lifecycle with the command,

qdel -j job-ID

# To kill all of your jobs
qdel -u $USER

You can of course use this with other utilities to build a more selective command,

# Use qstat and other utilities to select out your jobs in the 'Eqw' state and kill them.
qstat -u $USER | grep 'Eqw' | awk '{print $1}' | xargs -I {} qdel -j {}

Sometimes when you try to kill your job it doesn’t dissapear from qstat and remains in the state dr say. This is usually because the sge_execd daemon on the compute node where your job is running is not responding (maybe the node died). You can force the deletion by doing:

qdel -f -j job-ID

and that should clear it.

Resubmitting your Job

Sometimes instead of killing your job, you just want to restart it again. Perhaps your job failed to start because the job output directory didn’t exist so you have lots of jobs in the Eqw state and you have fixed the problem. You can use the qresub tool to do this.

Modifying the example given above,

# Use qstat and other utilities to select out your jobs in the 'Eqw' state and resubmit them.
qstat -u $USER | grep 'Eqw' | awk '{print $1}' | xargs -I {} qresub {}

This will put the jobs back into the cluster queue, the new jobs will have new, unique job IDs, they won’t keep the old IDs, so you likely want to then qdel the old jobs still.

Modifying your Job

The utility qalter allows you to modify batch system options supplied to your job. Usually you would want to do this while the job is in the queued/waiting state, but a select few options can also work on running jobs too. It is best to read the man page, or ask the support team if you want to use this.

For example you can change the queue you submitted your job to like so,

# Change job's queue to the serial.q
qalter -q serial.q  job-ID