Running your HPC Job

We use the qsub batch queueing system PBS Pro (PBS Professional). This is a standard across major clusters in the HPC community. The NCI National Facility uses a customized variant called ANU PBS, which is based off the open source implementation.

The most important thing to remember is not to run large computations on the login node.

The login node is so you can login, edit your code, compile it, and perhaps run tests, using a small test data set. Your real computational work needs to be run under a PBS submission script so that it can be distributed to one o the dedicated compute nodes.
This page explains how you can do this.

Summary of Running a Job

  • Determine the resources required for your job.
  • Create a Job Script, this wraps your job in a shell script, telling PBS your requirements.
  • Submit the job using qsub.
  • Monitor the job using qstat.
  • Delete the job, if required, using qdel.

Never run your programs on the cluster's head node directly, use PBS to schedule your job into the queue. This ensures efficient allocation of resources for everyone. If you need to test a script, run a smaller set of test data via PBS instead. Determine your resource requirements

To make effective use of the PBS queueing system, you will need to know how much resources your job will be using. When your job starts, PBS will make sure that appropriate resources are available for your job to run up to the maximum you have specified.

The resources can be specified by:

  • CPU cores - If your application is multi-threaded and can make use of more than one core, you will need to specify the number of cores your script will use. The maximums available at the moment are 48 for the AMD cluster and 16 for the Intel/GPGPU cluster.

  • Memory - This is how much memory your application will use. On a new piece of software or dataset, you might not know how much will be consumed. In such a case, start with a generous number and tune downwards. The more accurate you get, the more likely your job is to be scheduled during busy periods where small amounts of memory are available.

  • Walltime - This is how much time your job takes to run. You will need to estimate the job completion ??? and again, for new unknown job quantities, perform test runs, estimate a generous walltime based on the numbers and tune downwards. The smaller the wall-time, the more likely the job is to be scheduled during busy periods.

Create a Job Script

Your job script sets up the HPC resources we want PBS to reserve for our job. It would contain the following:

  • Your resource requirements for PBS to schedule your job - this needs to be at the top of your script for PBS to read it, before the first executable line in the script.
  • Any copying of data, setup of working directories and other pre-job administration that needs to take place
  • The job itself
  • Cleaning up temporary data, copying data to a longer term directory and other post-job administration

An example of a shell script called run_job.sh wrapping a task can be found below. Here the user’s UTS staff number is 999777 and their home directory is /shared/homes/999777/.
This example job requires 4 cores and up to 30 minutes to complete, so we have specified a wall time of 40 minutes to ensure it will finish with the walltime.

#!/bin/bash

#PBS -l ncpus=4
#PBS -l mem=20gb
#PBS -l walltime=00:40:00
# Set email address -- UTS email addresses only
#PBS -M 999777@uts.edu.au
# Send an email when job begins (b), gets aborted (a) and ends (e)
#PBS -m abe

# Create a working directory for input and output under /scratch/work/
# Copy your input data to there. 
mkdir /scratch/work/999777_$$
cp input.dat /scratch/work/999777_$$

# Change directory to the scratch directory and run your program.
# my_program uses input.dat creates an output file called "output.dat"
cd /scratch/work/999777_$$
/shared/homes/999777/my_program

# Copy results back to directory 
mv /scratch/work/999777_$$/output.dat /shared/homes/999777/

# Clean up 
rm /scratch/work/999777_$$/input.dat
rmdir /scratch/work/999777_$$

There are also example scripts in /shared/eresearch/

Submit your Job

Here we submit our job to the queue. Typeman qsub for the online manual pages.

$ qsub run_job.sh
11153.hpcnode1

Qsub will return the assigned job ID. This is typically a number, following by the name of the server you have submitted the job from. You can simply refer to the number in place of the full job ID.

Monitor your Job Status and List Jobs

Below is an example of the out put you will see. Type man qstat for the online manual pages.

$ qstat
Job id            Name             User            Time Use S Queue
----------------  ---------------- --------------  -------- - -----
211.hpcnode1      scaffold.build.  110234          570:36:5 R workq
235.hpcnode1      Histomonas.map.  100123                 0 Q workq
236.hpcnode1      run_job.sh       999777                 0 Q workq

Name is the name of your submitted script. User is your UTS staff user number. Time is the CPU time used. The S column indicates the job’s state as in the table below:

Q : Job is queued.
R : Job is running.
E : Job is exiting after having run.
F : Job is finished.
H : Job is held.
S : Job is suspended.

The Queue will be workq unless you have specified another queue to use in your job submission script.

More information can be listed by using the using command line options to qstat like -n1 which shows the node that the program is executing on.

$ qstat -n1 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
69580.hpcnode1  111111   workq    bigprimes   22234   1   8    5gb 120:0 R 23:47 hpcnode6/2*8
69581.hpcnode1  111111   workq    bigprimes   22698   1   8    5gb 120:0 R 23:47 hpcnode6/3*8
$ 

To list your finsihed jobs use -x (for expired). So for instance:

$ qstat -x 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1152.hpcnode1   999777     workq  bigprimes   56678   1   1  250gb   --  F 00:09

To Delete or Cancel your Job

To delete your job from the queue, use the qdel command:

$ qdel job_id

e.g. "qdel 1152.hpcnode1"

To Get Detailed Information on your Job

To show details for a specific job use qstat -f job_id. For instance, for the job "1152.hpcnode1 scaffold.build" job use:

$ qstat -f 1152.hpcnode1
Job Id: 1152.hpcnode1
    Job_Name = bigprimes
    Job_Owner = 999777@hpcnode1
    resources_used.cpupercent = 0
    resources_used.cput = 00:00:00
    resources_used.mem = 3220kb
    resources_used.ncpus = 1
    resources_used.vmem = 315200kb
    resources_used.walltime = 00:02:59
    job_state = R
    queue = workq
    Error_Path = hpcnode1:/shared/homes/999777/jobs/primes/bigprimes.e1152
    exec_host = hpc2/0
    Mail_Points = abe
    Mail_Users = 999777@uts.edu.au
    Output_Path = hpcnode1:/shared/homes/999777/jobs/primes/bigprimes.o1152
    Rerunable = True
    Resource_List.mem = 250gb
    Resource_List.ncpus = 1
    Resource_List.nodect = 1
    Resource_List.place = pack
    Resource_List.select = 1:mem=250gb:ncpus=1:vmem=250gb
    Resource_List.vmem = 250gb
    stime = Wed Apr 10 15:25:50 2013
    jobdir = /shared/homes/999777
    Variable_List = PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash,
    PBS_O_HOME=/shared/homes/999777,PBS_O_LOGNAME=999777,
    PBS_O_WORKDIR=/shared/homes/999777/jobs/primes/bigprimes,
    PBS_O_LANG=en_US.UTF-8,
    PBS_O_PATH=/usr/local/bin:/bin:/usr/bin:/bin,
    PBS_O_MAIL=/var/spool/mail/999777,PBS_O_QUEUE=workq,PBS_O_HOST=hpcnode1
    comment = Job run at Wed Apr 10 at 15:25 on (hpc2:mem=262144000kb:ncpus=1)
    etime = Wed Apr 10 15:25:50 2013
    Submit_arguments = bigprimes

Finishing up

A copy of the output of your PBS job stdout and stderr streams gets created in the directory you called PBS from as *.e and a *.o named files with the job_id appended.

An example of what the program bigprimes and job number 1152 would produce is:

bigprimes.e1152 - this should always be zero sized, i.e. empty, as it contains any errors your program may have produced.

bigprimes.o1152 - this will contain any screen output that your program would have produced.

Obtaining Information on the Nodes Available

Use the pbsnodes -a command to query the status of nodes, showing how much memory and number of CPUs there are.

pbsnodes -a

Obtaining Information on the Queues Available

You can get an up-to-date list of queues with by visiting the HPC Status page or while logged in you can get a more detailed list with:

$ qstat -Q
$ qstat -Qf

The default queue is "workq". There is a smaller queue "smallq" and a few others.

There are a few different job queues on the HPC, smallq and workq are two examples, and they have different resource limitations. To obtain a list of all the queues run the command below. In this example you can see there are 28 jobs running in the smallq queue, 5 jobs running in the workq and 3 jobs queued in workq.

$ qstat -Q 

Queue        Max   Tot Ena Str   Que   Run   Hld   Wat   Trn   Ext Type
---------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
smallq         0    28 yes yes     0    28     0     0     0     0 Exec
expressq       0     0 yes yes     0     0     0     0     0     0 Exec
workq          0     8 yes yes     3     5     0     0     0     0 Exec
$ 

To obtain full information on all the queues including their maximum cpus, memory and wall times run the command below. This is the best way to obtain up-to-date information on the queues available as we may modify queue maximum limits to manage the resources.

$ qstat -Qf 

Queue: smallq
queue_type = Execution
total_jobs = 28
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:28 Exiting:0 Begun:0 
resources_max.mem = 32gb ⇐ The most memory you can request 
resources_max.ncpus = 2 ⇐ The most CPUs you can request 
resources_max.walltime = 200:00:00
resources_default.walltime = 12:00:00
resources_assigned.mem = 101711872kb
resources_assigned.ncpus = 56
resources_assigned.nodect = 28