GPU Computing

Note

I’m not an expert in CUDA programming, so if any users of the cluster would like to contribute more detail to the below, or write up any useful documentation that they think would help other users please send us an email and take a look at Contributing.

The cluster has a number of Nvidia GPUs attached to particular nodes that can be used for parallel computing jobs that can utilise a GPGPU framework such as Nvidia’s CUDA. (See hardware-gpu-nodes for the hardware details of which GPUs we have).

The GPUs are integrated into the cluster’s batch system as a resource that can be requested, and managed so that if a job is using a GPU, then another job cannot also use the same GPU, so that we don’t overcommit resources.

To request a GPU you need to use an additional parameter when submitting a job to the cluster:

-l gpu=1

When requesting a GPU from the batch system, the batch system will look to see what GPUs are unused on the cluster and then assign your program to one.

This selection of GPU is then communicated to your program through an environment variable:

$SGE_HGR_gpu

This variable will be an integer that represents the GPU that your job should use.

Your job needs to bind itself to this GPU by way of the particular GPGPU SDK you are using.

Sample CUDA Program to Bind a specific GPU

For those familiar with CUDA the specific command to use there is:

# gpu_id here is an integer that must be obtained from $SGE_HGR_gpu
cudaSetDevice(gpu_id);

To see this in action there is a sample CUDA program we have in the job templates repository:

1
2
3
4
$ cp -r /cm/shared/examples/job_templates $HOME/
$ cd job_templates/cuda_selectgpu/
$ make
$ qsub -t 1-4 -q gpu.q -l gpu=1 cuda_selectgpu.job 100

The above will submit a job array, containing 4 tasks, to the cluster’s gpu queue: gpu.q. Each task requests 1 GPU, and will run the cuda_selectgpu program.

You can view the source of the program in cuda_selectgpu.cpp – it doesn’t do anything interesting, just shows how to bind your CUDA program to a specific GPU.

Requesting multiple GPUs

Continuing with the example above, if you request more than one gpu:

$ qsub -q gpu.q -l gpu=2 cuda_selectgpu.job 100

Then the $SGE_HGR_gpu variable will contain a space-separated list of GPU IDs that your program can then bind to. In the above example, the cuda_selectgpu job script will output:

Running on node152
0 1
Binding to GPU: 0 1
Sleeping for 1000...
./cuda_selectgpu Starting...
Attempt to select GPU 0...
Detected 2 CUDA Capable device(s)

Selecting Device 0: "Tesla K20m"
  Device PCI Bus ID / PCI location ID:           5 / 0
  (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
  GPU Clock rate:                                706 MHz (0.71 GHz)

Sleeping for 1 seconds...

In this case the test program isn’t written to handle this case, but you can see what it looks like and adapt your program to handle this if you need.

Requesting Specific GPU Hardware

We have different models of GPU on the cluster that can be seen here: Hardware. To request a specific type of GPU then you can use the following hostgroups:

$ qconf -shgrpl
...
@gpu
@gpu_k20m
@gpu_k40m
...

In the following way:

$ qsub -q gpu.q@@gpu_k40m -l gpu=1 ...

This will only run on the Nvidia K40m GPU nodes (these have a lot more GPU memory for instance, so might be good for certain types of application).

Note that certain nodes may have Access Control added to them if they are reserved for a particular department’s exclusive usage. Please see the queues page for details, or contact support with any questions about this.

Requesting a whole GPU Node

If you aren’t sure whether your code can or will request and work on a specific device when there are many present on a machine, then you should request to use a whole GPU node instead, to be sure you don’t take over another user’s use of a GPU. Do this similar to the above:

$ qsub -q gpu.q@@gpu_k20m -l gpu=2 ...

so here we request to use the nodes with the Nvidia k20m cards, as these nodes only have 2 GPUs a node, and then we request both gpus. This way you will have exclusive use of the node.