Parallel Processing
This tutorial describes how to set up a slurm script to simultaneously process multiple VLA images. This tutorial processes two images taken of the wide angle tail 3C 465. One image was taken in A configuration and the other was taken in B configuration. Both images were taken in the L-band. The data used in this tutorial can be downloaded from the VLA archive by entering the project code 12A-195 and choosing the B-configuration data taken in the 28th May 2012 and the A-configuration data taken on the 31st October 2012.
This script makes use of the parallel module to manage resources and enable parallel processing. The parallel module utilises all available processors, so that several processes can be run in parallel with each other. If there are more processes called than there are available processors, the parallel module will wait for a process to complete before starting the next process. For example, if there are three tasks (A, B and C) and only two processors parallel will initially execute tasks A and B. It will then only start running task C once either task A or B has completed.
Getting Started
After downloading the data untar it using the command
(host) $ tar -xvf 12A-195.sb12378614.eb13812032.56231.2797265625.ms.tar (host) $ tar -xvf 12A-195.sb7351094.eb10491731.56075.655723206015.ms.tar
This will unpack the two measurement sets:
12A-195.sb12378614.eb13812032.56231.2797265625.ms
12A-195.sb7351094.eb10491731.56075.655723206015.ms
Both observations contain the following fields:
Field
Field Name
Comments
0
3C48
Setup scan
1
3C48
Primary Calibrator
2
J2340+2641
Phase Calibrator
3
3C 465
Target
4
J0313+4120
Phase Calibrator (unused in this tutorial)
5
3C83.1B
Target (unused in this tutorial)
The A configuration observation was taken on the 31st October 2012 between 06:45 and 09:15. The B configuration image was taken on the 28th May 2012 between 15:44 and 18:13. The operators log for both observations can be downloaded from here.
Load the singularity module by entering:
(host) $ module load singularity
Download the singularity image created in Use of Singularity. The following command downloads the most up to date image which is 1.3 GB and so this may take some time!
(host) $ singularity pull library://mhardcastle/default/casa
Create the slurm script
The slurm script used in this tutorial is called
VLA_Parallel_Processing.slurmand contains the following lines of code:#!/bin/bash #SBATCH -J VLA-Parallel-Processing #SBATCH -A DIRAC-TP001-CPU #SBATCH -p icelake #SBATCH --nodes=1 #SBATCH --ntasks=2 #SBATCH --time=03:00:00 #SBATCH --mail-type=ALL #SBATCH --no-requeue #! Enter the script to run here . /etc/profile.d/modules.sh module load rhe18/default-icl module load singularity module load parallel # Describe the measurement sets to be processed INPUTS=("12A-195.sb12378614.eb13812032.56231.2797265625.ms A L" "12A-195.sb7351094.eb10491731.56075.655723206015.ms B L") # Set up the srun command # The -N1 -n1 options allocate a single core to each task srun="srun --exclusive -N1 -n1" # Set up the parallel command # The delay of 0.2 prevents overloading the controlling node # -j is the number of tasks to run simultaneously # --joblog and --resume combine to create a task log that can be used to monitor progress parallel="parallel --delay 0.2 -j $SLURM_NTASKS --joblog runtask.log --resume" # Run the command $parallel "$srun singularity exec casa_latest.sif casa -c VLA_Process_3C465_Images.py {1} ::: "${INPUTS[@]}"
Note the following points about the slurm script:
The command
#SBATCH -J VLA-Parallel-Processingnames the job VLA-Processing-Multiple-ImagesThe command
#SBATCH -A DIRAC-TP001-CPUis the name of the project under which time has been allocatedThe command
#SBATCH -p icelakeensures we are using the icelake clusterBy default slurm allocates one cpu per task and so the commands
#SBATCH --nodes=1and#SBATCH --ntasks=2combine to ask for two CPUs on a single node. On icelake each node has 76 CPUs with all CPUs on the same node sharing memory resources. Changing the nodes variable to 2 would have the effect of asking for two CPUs on different machines. Since the CPUs would no longer be sharing memory each task will run slightly quicker however the job is likely to take longer to schedule and is an inefficient use of resources.The command
#SBATCH --time=03:00:00is requesting 3 (wall-clock) hours of processing time.The command
#SBATCH --mail-type=ALLmeans email messages will be sent at the start and end of the job or (if applicable) when an error occurs. To disable this set the option toNONE.The command
#SBATCH --no-requeuemeans that if this job is interrupted by a node failure/system downtime it will not be automatically rescheduled.The command
. /etc/profile.d/modules.shenables the module commandThe command
module load rhe18/default-iclloads the basic environment needed by icelakeThe two module load commands load the singularity and parallel modules
The INPUTS command defines a list of parameters that will be passed to the casa script. In this example the list is typed directly into the script but this could be altered to read the parameters from a file.
The
srun --exclusive -n1 -N1allocates exclusive use of a single core to each taskThe
parallel --delay 0.2 -f $SLURM_NTASKStells the parallel process that we are runningntasksparallel processes. In this case ntasks=2, so we are running two parallel processes.The
$parallel...command iterates through theINPUTSlist calling thesruncommand for each element in the list. For each call ofsrun, parallel replaces the placeholder {1} with the list element. The commandsrunuses the casa_latest.sif singularity to call the VLA_Process_3C465_Images.py script within casa, sending it the parameters within the {1} placeholder.
Create the CASA script
The casa script used in this tutorial is called
VLA_Process_3C465_Images.pyand is based on the code used in the Getting Started tutorial. The download file contains the full script with a summarised version given below:from sys import argv params = argv[1].split() vis = params[0] config = params[1] band = params[2] smoothed_vis = vis[:-3]+'-smoothed.ms' primary_calibrator = '1' phase_calibrator = '2' target_field='3' refant = 'ea21' caltable_antpos = smoothed_vis[:-3]+".antpos" listobs(vis=vis, verbose=True, listfile=vis[:-3]+'.listobs') # Standard casa data flagging and calibration commands go here # Set up the variables used in imaging. The values depend upon the configuration if config=='A': cell=['0.25arcsec','0.25arcsec'] imsize=[11250,11250] scales=[0,10,26] elif config=='B': cell=['1arcsec','1arcsec'] imsize=[3072,3072] scales=[0,9,22] elif config=='C': cell=['3arcsec','3arcsec'] imsize=[1024,1024] scales=[0,9,23] elif config=='D': cell=['10arcsec','10arcsec'] imsize=[320,320] scales=[0,9,23] # Extract data used for imaging from the measurement set rms = stats['rms'][0] tclean(vis=smoothed_vis, field=target_field, imagename=smoothed_vis[:-3]+'-Clean', cell=cell, imsize=imsize, niter=20000, threshold=str(rms*5)+'Jy', stokes='I', deconvolver='multiscale', scales=scales, smallscalebias=0.9, weighting='briggs', robust=0.5, pbcor=True)
Note the following points about the casa script:
The
params = argv[1].split()command imports the parameter string that was supplied by the call to parallel in the slurm script and splits it into its components. The next few lines populate the variables used throughout the script. In this example the name of the measurement set as well as the VLA configuration and band of the measurement set are all supplied. This could be expanded to include any additional information desired.The
listobsandtcleancommands give a simple example of how the variables can be used within the scriptThe nested
ifblock is an example of how to use the data to set up the variables used during imaging. This script only uses theconfigvariable but this could easily be expanded to include additional variables such asband.
Running the scripts
Log on to the Cambridge CSD3 system as described in Login.
If necessary download the casa singularity as described in Getting Started.
Run the slurm script by entering
(host) $ sbatch VLA_Parallel_Processing.slurm
Check the casa .log and runtask.log files for any errors. An exit value of 1 in the runtask.log file indicates a terminal error occurred and the process was terminated prematurely.