Requesting Resource

Slots

Single core jobs

By default, if you do not specify any particular amount of slots for your job, you will be allocated just 1 slot and your job will be automatically bound to it:

[login node] $ qsub worker.sh 120 1
job 2615 ("worker.sh") has been submitted
[login node] $ ssh node052
[node052] $ top -u myuser
PID   USER       %CPU COMMAND
18768 myuser     100  work

Tip

Be ware that even if your job is multi-threaded it will be restricted to just 1 slot, for example if you attempt to use 2 slots but forgot to request them:

login node$ qsub worker.sh 120 2
job 2614 ("worker.sh") has been submitted

login node$ qstat -u myuser
job-ID name       user   state queue                    slots
2614   worker.sh  myuser r     my.q@node054.cm.cluster  1

execution_node$ top -u myuser
PID   USER       %CPU
18769 myuser     50.0
18770 myuser     50.0

So that only 1 slot is allocated and both threads will be restricted to run on the same slot.

Array of single-threaded tasks

Analogously, if you submit an array job you will be allocated just 1 slot per task.

login node$ qsub -t 1-2 -l h=node052 worker.sh 120 1
job 2614 ("worker.sh") has been submitted

login_node $ qstat -u jlr41
job-ID name     user   state queue                   slots ja-task-ID
2614 worker.sh  myuser r     my.q@node052.cm.cluster 1     1
2614 worker.sh  myuser r     my.q@node052.cm.cluster 1     2

login_node $ ssh node052
node052 $ top -u myuser
PID   USER       %CPU COMMAND
18767 myuser     100  work
18768 myuser     100  work

Each task will be bound to 1 slot. If a task tries to run 2 or more threads, all those threads per task will be bound to a single slot.

Tip

To find out which slots my tasks are bound to:

[login_node ]$ qstat -j 2614 | grep binding
binding:                    set linear:1
binding               1:    node052.cm.cluster=0,0
binding               2:    node052.cm.cluster=1,0

which reads, task 1 on node052, socket 0, cpu 0, and so on.

Multi-threaded jobs

In order to actually be allocated multiple slots, you should request them by using a parallel environment pe.

If you job is multi-threaded or if it is a shared memory parallel application you can make use of the openmp parallel environment to request multiple slots:

qsub -pe openmp 4 myjob.sh

where 4 here is the number of slots you need for your job.

To request 2 slots for a multi-threaded job then you do:

[login_node] $  qsub -pe openmp 2 -l h=node052 worker.sh 1200 2
[login_node] $ qstat -u myuser
job-ID  name       user    state queue                          slots
2618    worker.sh  myuser  r     parallel.q@node052.cm.cluster  2
[login_node] $ qstat -j 2618 | grep binding
binding:                    set linear:2
binding               1:    node052.cm.cluster=0,0:0,1

So that the job is now bound to 2 slots as requested.

Tip

If the job tries to run 3 or more threads, all of them will be bound to the 2 slots already granted.

Array of multi-threaded tasks

Some times you might want to sent an array of parallel tasks. In that case the amount of slots requested are allocated in a per task basis too.

Suppose you want to request 2 slots per task on a 2-tasks array job:

[login_node] $  qsub -t 1-2 -pe openmp 2 -l h=node052 worker.sh 1200 2
job-ID  name       user    state queue                         slots ja-task-ID
2619    worker.sh  myuser  r     parallel.q@node052.cm.cluster     2 1
2619    worker.sh  myuser  r     parallel.q@node052.cm.cluster     2 2
[login node] $ qstat -j 2619 | grep binding
binding:                    set linear:2
binding               1:    node052.cm.cluster=1,0:1,1
binding               2:    node052.cm.cluster=0,2:0,3

First task then is running on node052, socket 1, slots 0 and 1, whereas the second task is running on node052, socket 0, slots 2 and 3.

Memory Requests

Memory requests can be done at 2 levels

1. Main memory level: by using the m_mem_free resource request, which controls usage of the physical memory

2. Virtual memory level: by using the h_vmem resource request, which controls the total amount of memory usage (m_mem_free + swap space)

Default behaviour

By default, i.e. if not requested otherwise, each slot is allocated 2GB of main memory, and a virtual memory of 2.5GB. If the process running on 1 slot goes beyond 2G, it will be swapped out, and if it goes beyond 2.5GB it will be simply killed and the job or task will fail.

In other words, virtual memory, h_vmem, will always be a hard limit, jobs will be killed when they try to use more than the amount granted.

Requesting more memory

You can request more memory by explicitly defining m_mem_free either within you submission script or along the submission line:

qsub -l m_mem_free=4G myjob.sh

In this case virtual memory h_vmem will be automatically limited to h_vmem=1.25*m_mem_free=5G. If the process exceeds 4G then it will be swapped out, whereas if the process exceeds 5G it will be killed.

Tip

Always use units of memory for your request, you can use either [K,M,G,k,m,g] for kB, MB, GB, kiB, MiB or GiB, respectively. If no units are used, the job will be rejected.

Tip

Preventing swapping

If you specify a particular amount of main memory m_mem_free as explained above, the system will always allocate 25% extra grace space in swap, regardless of the virtual memory requested.

Using swap has some performance implications though. Using Hard Disk as RAM will be so slow that the CPUs will be just waiting most of the time for Input/Output to proceed. If you want to avoid this behaviour then you can exclusively request virtual memory.

qsub -l m_hvmem=4G myjob.sh

That way, the system will automatically limit the main memory to m_mem_free=h_vmem, and the job won’t be swapped out, it will be killed when its main memory goes beyond 4GB though.

Array jobs

In general, memory requests are taken on a per slot basis wrt multi-thread or array jobs.

If you want to request 25G of main memory per task on a 3-task array job then you can do the following:

qsub -t 1-3 -l m_mem_free=25G  myjob.sh

All the rules described above for single threaded jobs will apply here wrt memory limits, but in a per task basis.

Multi-threaded jobs

Again, memory requests are taken on a per slot basis.

For example, say that you want to submit a 2-slots parallel job and you know each thread will need at least 8GB of RAM, you then request -l m_mem_free=8G but inadvertently, each thread uses 9GB of RAM:

[login_node] qsub -l m_mem_free=8G -pe openmp 2 -l h=node052 mem_pe.sh 9g
[login_node] ssh node052
[node052] top -u myuser
PID   USER    VIRT    RES    %CPU %MEM COMMAND
16365 myuser  9445732 8.232g  2.7 13.1 memhog
16364 myuser  9445732 8.089g  2.3 12.9 memhog

This job will be bound two 2 slots, each slot will have a physical memory limit of 8GB. By default though, total memory limit will be 10GB per task (remember that h_vmem=1.25*m_mem_free, by default). The job will try then to fill up 9GB of RAM per slot, it will use 8GB (see RES) of main memory and 1GB will be swapped out per thread.

Tip

RES in the top output stands for Resident Set Size, and accounts for the use of main memory, VIRT on the other hand accounts for the amount of memory the process reserved but have not used (not to be confused with h_vmem, which is the total amount of memory allocated by the scheduler).

Following job would fail though:

[login_node] qsub -l h_vmem=8G -pe openmp 2 -l h=node052 mem_pe.sh 9g
Your job 2627 ("mem_pe.sh") has been submitted

[login_node ] $ qacct -j 2627 | grep exit
exit_status  137

Here the hard limit for main memory per task is (m_mem_free=h_vmem=8G) and the job will try to fill up 9GB per thread, the scheduler will kill the job by sending signal no 9, which will be reported as exist status exit code=128 + signal number.

Job Classes

Job classes are created by the cluster admins and contain preset default parameters for your job, in exactly the same way you would pass to qsub on the command line, eg: a job class might specify a default queue (-q serial.q) , a runtime (-l h_rt=1:00:00), and so on.

By submitting your job by specifying a job class, your job inherits all of the job class’ default parameters. If you specify any of the same parameters on the command line as well, you will override the job class’ defaults.

# This job will use the mps.short job class
qsub -jc mps.short -q mps.q ...

Job History

To determine how much memory or time you need to request, it can sometimes help to look at a past job you have submitted and see how much resources that job used on the cluster. You can do this with the cluster’s accounting tool qacct.

# Here the tool is called giving the job ID as an argument
qacct -j 6444841
==============================================================
qname        admin.q
hostname     node108.cm.cluster
group        root
owner        root
project      NONE
department   defaultdepartment
jobname      hepspec
jobnumber    6444841
taskid       undefined
account      sge
priority     0
cwd          /lustre/scratch/sysadmin/hepspec
submit_host  feynman.cm.cluster
submit_cmd   qsub -q admin.q@node108 hepspec.job
qsub_time    08/13/2015 13:20:56.645
start_time   08/13/2015 13:20:57.074
end_time     08/13/2015 19:17:04.598
granted_pe   NONE
slots        1
failed       0
deleted_by   NONE
exit_status  255
ru_wallclock 21367.524
ru_utime     1265488.522
ru_stime     4358.245
ru_maxrss    494720
ru_ixrss     0
ru_ismrss    0
ru_idrss     0
ru_isrss     0
ru_minflt    543962721
ru_majflt    6458
ru_nswap     0
ru_inblock   38280
ru_oublock   122482440
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     5256562
ru_nivcsw    14159484
cpu          1269846.767
mem          153438.316
io           824.185
iow          0.000
maxvmem      42.640G
maxrss       24.257G
maxpss       23.980G
arid         undefined
jc_name      NONE

So we can see that the job ran for nearly 6 hours, or 21367.524 seconds as measured by ru_wallclock and it used 24.257G of memory at it’s peak (maxrss).