We use the qsub batch queueing system PBS Pro (PBS Professional). This is a standard across major clusters in the HPC community. The NCI National Facility uses a customized variant called ANU PBS, which is based off the open source implementation.
The most important thing to remember is not to run large computations on the login node.
The login node is so you can login, edit your code, compile it, and perhaps run tests, using a small test data set. Your real computational work needs to be run under a PBS submission script so that it can be distributed to one o the dedicated compute nodes.
This page explains how you can do this.
Summary of Running a Job
- Determine the resources required for your job.
- Create a Job Script, this wraps your job in a shell script, telling PBS your requirements.
- Submit the job using qsub.
- Monitor the job using qstat.
- Delete the job, if required, using qdel.
Never run your programs on the cluster's head node directly, use PBS to schedule your job into the queue. This ensures efficient allocation of resources for everyone. If you need to test a script, run a smaller set of test data via PBS instead. Determine your resource requirements
To make effective use of the PBS queueing system, you will need to know how much resources your job will be using. When your job starts, PBS will make sure that appropriate resources are available for your job to run up to the maximum you have specified.
The resources can be specified by:
CPU cores - If your application is multi-threaded and can make use of more than one core, you will need to specify the number of cores your script will use. The maximums available at the moment are 48 for the AMD cluster and 16 for the Intel/GPGPU cluster.
Memory - This is how much memory your application will use. On a new piece of software or dataset, you might not know how much will be consumed. In such a case, start with a generous number and tune downwards. The more accurate you get, the more likely your job is to be scheduled during busy periods where small amounts of memory are available.
Walltime - This is how much time your job takes to run. You will need to estimate the job completion ??? and again, for new unknown job quantities, perform test runs, estimate a generous walltime based on the numbers and tune downwards. The smaller the wall-time, the more likely the job is to be scheduled during busy periods.
Create a Job Script
Your job script sets up the HPC resources we want PBS to reserve for our job. It would contain the following:
- Your resource requirements for PBS to schedule your job - this needs to be at the top of your script for PBS to read it, before the first executable line in the script.
- Any copying of data, setup of working directories and other pre-job administration that needs to take place
- The job itself
- Cleaning up temporary data, copying data to a longer term directory and other post-job administration
An example of a shell script called
run_job.sh wrapping a task can be found below. Here the user’s UTS staff number is 999777 and their home directory is
This example job requires 4 cores and up to 30 minutes to complete, so we have specified a wall time of 40 minutes to ensure it will finish with the walltime.
#!/bin/bash #PBS -l ncpus=4 #PBS -l mem=20gb #PBS -l walltime=00:40:00 # Set email address -- UTS email addresses only #PBS -M email@example.com # Send an email when job begins (b), gets aborted (a) and ends (e) #PBS -m abe # Create a working directory for input and output under /scratch/work/ # Copy your input data to there. mkdir /scratch/work/999777_$$ cp input.dat /scratch/work/999777_$$ # Change directory to the scratch directory and run your program. # my_program uses input.dat creates an output file called "output.dat" cd /scratch/work/999777_$$ /shared/homes/999777/my_program # Copy results back to directory mv /scratch/work/999777_$$/output.dat /shared/homes/999777/ # Clean up rm /scratch/work/999777_$$/input.dat rmdir /scratch/work/999777_$$
There are also example scripts in
Submit your Job
Here we submit our job to the queue. Typeman qsub for the online manual pages.
$ qsub run_job.sh 11153.hpcnode1
Qsub will return the assigned job ID. This is typically a number, following by the name of the server you have submitted the job from. You can simply refer to the number in place of the full job ID.
Monitor your Job Status and List Jobs
Below is an example of the out put you will see. Type man qstat for the online manual pages.
$ qstat Job id Name User Time Use S Queue ---------------- ---------------- -------------- -------- - ----- 211.hpcnode1 scaffold.build. 110234 570:36:5 R workq 235.hpcnode1 Histomonas.map. 100123 0 Q workq 236.hpcnode1 run_job.sh 999777 0 Q workq
Name is the name of your submitted script. User is your UTS staff user number. Time is the CPU time used. The S column indicates the job’s state as in the table below:
Q : Job is queued. R : Job is running. E : Job is exiting after having run. F : Job is finished. H : Job is held. S : Job is suspended.
The Queue will be workq unless you have specified another queue to use in your job submission script.
More information can be listed by using the using command line options to qstat like -n1 which shows the node that the program is executing on.
$ qstat -n1 Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 69580.hpcnode1 111111 workq bigprimes 22234 1 8 5gb 120:0 R 23:47 hpcnode6/2*8 69581.hpcnode1 111111 workq bigprimes 22698 1 8 5gb 120:0 R 23:47 hpcnode6/3*8 $
To list your finsihed jobs use -x (for expired). So for instance:
$ qstat -x Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1152.hpcnode1 999777 workq bigprimes 56678 1 1 250gb -- F 00:09
To Delete or Cancel your Job
To delete your job from the queue, use the qdel command:
$ qdel job_id e.g. "qdel 1152.hpcnode1"
To Get Detailed Information on your Job
To show details for a specific job use
qstat -f job_id. For instance, for the job "1152.hpcnode1 scaffold.build" job use:
$ qstat -f 1152.hpcnode1 Job Id: 1152.hpcnode1 Job_Name = bigprimes Job_Owner = 999777@hpcnode1 resources_used.cpupercent = 0 resources_used.cput = 00:00:00 resources_used.mem = 3220kb resources_used.ncpus = 1 resources_used.vmem = 315200kb resources_used.walltime = 00:02:59 job_state = R queue = workq Error_Path = hpcnode1:/shared/homes/999777/jobs/primes/bigprimes.e1152 exec_host = hpc2/0 Mail_Points = abe Mail_Users = firstname.lastname@example.org Output_Path = hpcnode1:/shared/homes/999777/jobs/primes/bigprimes.o1152 Rerunable = True Resource_List.mem = 250gb Resource_List.ncpus = 1 Resource_List.nodect = 1 Resource_List.place = pack Resource_List.select = 1:mem=250gb:ncpus=1:vmem=250gb Resource_List.vmem = 250gb stime = Wed Apr 10 15:25:50 2013 jobdir = /shared/homes/999777 Variable_List = PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash, PBS_O_HOME=/shared/homes/999777,PBS_O_LOGNAME=999777, PBS_O_WORKDIR=/shared/homes/999777/jobs/primes/bigprimes, PBS_O_LANG=en_US.UTF-8, PBS_O_PATH=/usr/local/bin:/bin:/usr/bin:/bin, PBS_O_MAIL=/var/spool/mail/999777,PBS_O_QUEUE=workq,PBS_O_HOST=hpcnode1 comment = Job run at Wed Apr 10 at 15:25 on (hpc2:mem=262144000kb:ncpus=1) etime = Wed Apr 10 15:25:50 2013 Submit_arguments = bigprimes
A copy of the output of your PBS job stdout and stderr streams gets created in the directory you called PBS from as
*.e and a
*.o named files with the
An example of what the program bigprimes and job number 1152 would produce is:
bigprimes.e1152 - this should always be zero sized, i.e. empty, as it contains any errors your program may have produced.
bigprimes.o1152 - this will contain any screen output that your program would have produced.
Obtaining Information on the Nodes Available
pbsnodes -a command to query the status of nodes, showing how much memory and number of CPUs there are.
Obtaining Information on the Queues Available
You can get an up-to-date list of queues with by visiting the HPC Status page or while logged in you can get a more detailed list with:
$ qstat -Q $ qstat -Qf
The default queue is "workq". There is a smaller queue "smallq" and a few others.
There are a few different job queues on the HPC, smallq and workq are two examples, and they have different resource limitations. To obtain a list of all the queues run the command below. In this example you can see there are 28 jobs running in the smallq queue, 5 jobs running in the workq and 3 jobs queued in workq.
$ qstat -Q Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type ---------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ---- smallq 0 28 yes yes 0 28 0 0 0 0 Exec expressq 0 0 yes yes 0 0 0 0 0 0 Exec workq 0 8 yes yes 3 5 0 0 0 0 Exec $
To obtain full information on all the queues including their maximum cpus, memory and wall times run the command below. This is the best way to obtain up-to-date information on the queues available as we may modify queue maximum limits to manage the resources.
$ qstat -Qf Queue: smallq queue_type = Execution total_jobs = 28 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:28 Exiting:0 Begun:0 resources_max.mem = 32gb ⇐ The most memory you can request resources_max.ncpus = 2 ⇐ The most CPUs you can request resources_max.walltime = 200:00:00 resources_default.walltime = 12:00:00 resources_assigned.mem = 101711872kb resources_assigned.ncpus = 56 resources_assigned.nodect = 28