Requesting Resource

Multiple Threads

If you wish to run a multi-threaded job, i.e. your job needs to use more than one CPU core, you can request more slots like this:

qsub -pe openmp 4 myjob.sh

where 4 here is the number of cores/slots you want for your job.

Increase Memory

Each node is allocated 4GB of RAM by default. If you know how much RAM your job needs then it’s a good idea to specify this on the qsub line:

qsub -l m_mem_free=8G myjob.sh

which claims 8Gb of ram for the job. This will create a hard limit on your job to ensure that it doesn’t use more that 8GB of RAM, so if you know that the maximum amount of memory your job needs is 8GB, then maybe allow a little extra by requesting 8.5GB or 9GB.

Job Classes

Job classes are created by the cluster admins and contain preset default parameters for your job, in exactly the same way you would pass to qsub on the command line, eg: a job class might specify a default queue (-q serial.q) , a runtime (-l h_rt=1:00:00), and so on.

By submitting your job by specifying a job class, your job inherits all of the job class’ default parameters. If you specify any of the same parameters on the command line as well, you will override the job class’ defaults.

# This job will use the mps.short job class
qsub -jc mps.short -q mps.q ...

Job History

To determine how much memory or time you need to request, it can sometimes help to look at a past job you have submitted and see how much resources that job used on the cluster. You can do this with the cluster’s accounting tool qacct.

# Here the tool is called giving the job ID as an argument
qacct -j 6444841
==============================================================
qname        admin.q
hostname     node108.cm.cluster
group        root
owner        root
project      NONE
department   defaultdepartment
jobname      hepspec
jobnumber    6444841
taskid       undefined
account      sge
priority     0
cwd          /lustre/scratch/sysadmin/hepspec
submit_host  feynman.cm.cluster
submit_cmd   qsub -q admin.q@node108 hepspec.job
qsub_time    08/13/2015 13:20:56.645
start_time   08/13/2015 13:20:57.074
end_time     08/13/2015 19:17:04.598
granted_pe   NONE
slots        1
failed       0
deleted_by   NONE
exit_status  255
ru_wallclock 21367.524
ru_utime     1265488.522
ru_stime     4358.245
ru_maxrss    494720
ru_ixrss     0
ru_ismrss    0
ru_idrss     0
ru_isrss     0
ru_minflt    543962721
ru_majflt    6458
ru_nswap     0
ru_inblock   38280
ru_oublock   122482440
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     5256562
ru_nivcsw    14159484
cpu          1269846.767
mem          153438.316
io           824.185
iow          0.000
maxvmem      42.640G
maxrss       24.257G
maxpss       23.980G
arid         undefined
jc_name      NONE

So we can see that the job ran for nearly 6 hours, or 21367.524 seconds as measured by ru_wallclock and it used 24.257G of memory at it’s peak (maxrss).