Queues and Policies¶
Basic Structure of Queues¶
Queues are where your jobs will run, and usually address some set of the available hardware in the cluster, as well as define policies on what kind of jobs can run in the queue. Policies might be things like,
- How long your job can run for
- What users can run in the queue
- How much memory or slots your jobs can request
- Whether you can run interactive or batch jobs in the queue
- Whether the queue supports parallel jobs and which parallel environments it allows.
1) To check the available queues use the command: qconf -sql; to get information about a particular queue queue.q please use qconf -sq queue.q.
2) Something to look for in the queue config are words like qtype, pe_list and user_list which you may do with commands like:
qconf -sq queue.q | grep qtype
the result may contain the words BATCH, INTERACTIVE which mean that the queue can be used for batch (submit jobs) as well as interactive sessions (use qlogin)
3) Another important information are the hosts where the queue isn running its jobs. The word to look for is hostlist ...you will get answers like ...
hostlist @HP-16c-64G-HighBW @HP-16c-64G-3rdBW \
@ in front of a name indicates a collection of computing hosts. In order to get the list of computing nodes associated to a certain group please use qconf -shgrp @hostgroup
you may get then things likegroup_name @HP-16c-64G-HighBW
hostlist node001.cm.cluster node002.cm.cluster node003.cm.cluster \
node004.cm.cluster node008.cm.cluster node009.cm.cluster \
node010.cm.cluster node011.cm.cluster node012.cm.cluster \
node013.cm.cluster node014.cm.cluster node015.cm.cluster \
node016.cm.cluster node017.cm.cluster node018.cm.cluster \
node019.cm.cluster node020.cm.cluster node021.cm.cluster \
node022.cm.cluster node023.cm.cluster node024.cm.cluster \
node025.cm.cluster node026.cm.cluster node027.cm.cluster \
node028.cm.cluster node029.cm.cluster node030.cm.cluster \
node031.cm.cluster node032.cm.cluster node033.cm.cluster \
node034.cm.cluster node035.cm.cluster node036.cm.cluster \
node037.cm.cluster node038.cm.cluster node039.cm.cluster \
node040.cm.cluster node042.cm.cluster node043.cm.cluster \
node044.cm.cluster node045.cm.cluster node046.cm.cluster \
The core, open-access queues are as follows,
There are also a number of Private Queues queues that address specific hardware subsets owned by certain departments or researchers.
- max length of jobs: 48 Hrs
- max number of jobs per user: 4
- total available slots: 240
- openmp available for multithreaded jobs
- no MPI
- max length of jobs: 10 minutes
- max number of jobs per user: 12
- total available slots: 1228
- parallel (openmp and MPI) available
- max length of jobs: 7 Days
- total available slots: 128
- max number of jobs per user: 48
- no parallel environments available
- max length of jobs: No Limit
- total available slots: 252
- max number of jobs per user: 72
- all parallel environments available
Addresses only nodes that have GPUs in them (see hardware-gpu-nodes).
- max length of jobs: 2 days
- total available slots: 80
- max number of slots per user: 80
- parallel environments available
The below description outlines the new policy for mps.q that is to be implemented in September 2015.
All users of mps.q need to prepare themselves for these changes by doing three things:
- Review your job’s memory usage, particularly if you think you may need more than 2GiB per slot, which will be the default allocation
- Review your job’s running time and determine which job class (see below) you fall into
- If you have a job that runs longer than 5 days, look into how you can checkpoint and restart your job.
mps.q is only available to users in MPS. If you are in this school but unable to submit to the queue please email email@example.com.
- max length of jobs: 5 days
- total available slots: 1344
- parallel environments available
mps.q is the first queue to move to mandatory resource requests.
This means that if you do not specify how much resources you need (see: Requesting Resource), then you will be given a default allocation. Jobs that exceed this allocation will be killed.
Default Resource Allocation
- CPU cores equal to number of slots requested
- 2GiB of RAM per slot requested
You also have to specify a maximum wallclock run time for your job. If you do not specify one on submission you will get an error message and your job will not be queued.
To make this easier, mps.q has a number of job classes available that put your job into a time category: mps.short, mps.medium, mps.long. Currently the maximum run-time that these classes correspond to is listed below.
Each job class also has a different number of slots that can run simultaneously, with short jobs having the most and long jobs having the least.
- max walltime: 2 hours
- max simultaneously running slots per user: approx. 50% of available slots (~ 600 slots)
- max walltime: 8 hours
- max simultaneously running slots per user: approx. 20% of available slots (~ 256 slots)
- max walltime: 5 days
- max simultaneously running slots per user: approx. 5% of available slots (~ 64 slots)
This is just the maximum that you can have running at any one time, but you can of course queue more jobs than this.
You specify a job class either in your job script or on the command line with the -jc parameter, as shown in Job Classes.
If you have a large job that needs more slots than your job class allows here (such as a long running, large MPI job) then please contact firstname.lastname@example.org explaining what you want to do and we can add you to a special job class that can allow this.
If your job takes longer than the maxiumum length of time of the queue then you need to look at checkpointing your job. Checkpointing means to write out to disk the state of your computation at various points throughout your job’s execution, or when your job receives a signal from the batch system saying that it is about to be killed.
Very long jobs have been tolerated in the past because we had spare capacity but this was not a good practice, both for the rest of the cluster and for people who have got into the habit of running multiple week or sometimes even month long jobs. The machines in the cluster need regular maintenance, and sometimes crash losing all of your work if you are not checkpointing regularly. Therefore we have set a maximum run-time as a motivation to encourage everyone with long jobs to begin checkpointing.