UA Research Computing

HPC Examples

Array Jobs and Parallel Tasks

About

Figuring out effective ways to submit large numbers of jobs can be frustrating. The script examples below are designed to help implement different techniques to run jobs as arrays or in parallel

Each script is designed to print commands and information specific to that job for better visualization/demonstration purposes.

HPC users should be able to run each script with minor modifications (i.e. adding their PI’s group name) to experiment.

Array Jobs

⚠️ Be careful using custom output filenames (e.g. -o myjob.out) in array jobs. If the filename specified doesn’t include unique descriptors (e.g., array ID), each subjob will write to and replace that filename, meaning you’ll only capture the output from a single subjob. See our documentation on filename patterns for help.

An array job is a way to easily submit multiple jobs using the same SLURM script with a single sbatch. When an array job is submitted, it will create multiple subjobs, each using the same SLURM directives and same commands as written in the main script.

Basic Array Job

If you have never used array jobs before, start here. Submit multiple jobs using the same script with only one sbatch command.

Array Jobs With Text Filenames

Run multiple jobs where each opens a different file but the naming scheme isn’t conducive to automating the process using simple array indices as shown in Basic_Array_Job (i.e. 1.txt, 2.txt,…).

Sample Array With Input Parameters

Run multiple jobs where each uses a distinct combination of input parameters. Very similar to reading text filenames above.

Software-Specific Array Examples

Array Job in R

Run multiple simulations using the same R script and the same batch submission script while saving the output to unique filenames. Redirects to R examples.


Parallel Jobs

Basic GNU Parallel Job

Run multiple tasks on a single node without using an array job. Tasks that are not run immediately due to space restrictions are queued and are automatically executed as space becomes available. The parallelization was accomplished using GNU Parallel.

Parallel Processing in Python with Multiprocessing

The python multiprocessing package is a popular way to spread a workflow over multiple CPUs. This example demonstrates how to assign multiple CPUs to a SLURM job and then utilize them in a multiprocessing python job.


Combining Parallel and Array Jobs

Array Job With GNU Parallel

This script is roughly an extension of Sample Array Job, but with the additional step of parallelizing tasks within each subjob. The parallelization was accomplished using GNU Parallel. This allows for running a tremendous number of tasks.