Cambridge CSD3 cheat sheet
The Cambridge CSD3 system is documented extensively on its own web pages. These are internal notes for quick start for our project.
Login
Log in with your username to the CPU system (DIRAC username):
(host) $ ssh <username>@login-cpu.hpc.cam.ac.uk
If you have set up an authorized key, you may use this to bypass the password prompt:
(host) $ ssh -i ~/.ssh/dirac_rsa <username>@login.hpc.cam.ac.uk
Work area
The project shared disk space is /rds/project/rds-bRdYdViqoGA/.
Make yourself a working directory there.
Jobs
The system uses Slurm (though for those familiar with Torque, qsub
and qstat commands work more or less as you would expect on the
login nodes).
There are three main types of compute nodes on the cluster, Skylake,
Cascade Lake and Ice Lake (from earlier to later generation). Skylake
nodes are 32-core nodes, Cascade Lake 56 core, and Ice Lake 76 core
nodes. All are Intel-based. These show up as partitions (like Torque
queues) on the cluster, i.e. you are explicitly selecting the type of
machine you will get when you submit to the cluster using the
partitions skylake, cclake or icelake. There are also
himem (high-memory) version of each partition.
To get a single core for testing purposes on one of these machines do e.g.
(CSD3) $ sintr -p icelake
This drops you in a screen session on a compute node. You can
specify wall time, numbers of cores etc on the command line.
To run a batch job do e.g.
(CSD3) $ sbatch example_slurm_script
You can find template job scripts in
/usr/local/Cluster-Docs/SLURM. A useful option in batch mode is
#SBATCH --mail-type=ALL which ensures that you will be e-mailed on
start, end and error. Job output appears in the directory where you
ran the sbatch command.
To view the queue in native Slurm mode use the squeue command.
Note that if (and only if) you have a Slurm job running on a compute
node you are allowed to ssh into it to check on the job, run
htop etc.
Modules
Modules work as on the Hertfordshire system, e.g. module load
singularity gets you access to the singularity command. There are
many more modules available though.