UA Research Computing

HPC Examples

SLURM Job Dependencies Example

Overview

Sometimes projects need to be split up into multiple parts where each step is dependent on the step (or several steps) that came before. SLURM dependencies are a way to automate this process.

In this example, we’ll create a number of three-dimensional plots using Python and will combine them into a gif as the last step. A job dependency is a good solution in this case since the job that creates the gif is dependent on all the images being present.

Data structure

We’ll try to keep things in order by partitioning our data, output, and images in distinct directories. These directories and files can be downloaded by clicking the button at the top of the page.

(elgato) [user@wentletrap volcano]$ tree
.
├── create_gif.slurm
├── data
│   └── volcano.csv
├── generate_frames.slurm
├── images
├── output
│   ├── archives
│   └── slurm_files
├── submit-gif-job
└── volcano.py

Scripts

Python script

The Python example script was pulled and modified from the Python graph gallery and the CSV file used to generate the image was downloaded from: https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/volcano.csv

Below you’ll notice one modification to the original script: n = int(sys.argv[1]). We’re going to execute this in an array job and will be importing the array indices (integers n where 1 ≤ n ≤ 360) into this script as arguments so that we can manipulate the viewing angle (ax.view_init(30, 45 + n)). Each frame will be slightly different and, when combined into a gif, will allow us to execute a full rotation of the 3D volcano plot.

#!/usr/bin/env python3
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import sys

n = int(sys.argv[1]) # <- We'll import an array index to rotate our image

# Original example from: https://www.python-graph-gallery.com/3d/
# CSV available from   : https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/volcano.csv
data = pd.read_csv("data/volcano.csv")
 
# Transform it to a long format
df=data.unstack().reset_index()
df.columns=["X","Y","Z"]
 
# And transform the old column name in something numeric
df['X']=pd.Categorical(df['X'])
df['X']=df['X'].cat.codes
 
# Make the plot
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.set_axis_off()
ax.plot_trisurf(df['Y'], df['X'], df['Z'], cmap=plt.cm.viridis, linewidth=0.2)
ax.view_init(30, 45 + n)
plt.savefig('images/image%s.png'%n,format='png',transparent=False)

SLURM script to generate images

This is the job where we will generate all of our images. In each step, we will pass our array index to our python script to determine the viewing angle of our plot.

Script: generate_frames.slurm

#!/bin/bash

#SBATCH --account=hpcteam
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:05:00
#SBATCH --job-name=generate_frames
#SBATCH -o output/slurm_files/%x-%A.out
#SBATCH -e output/slurm_files/%x-%A.err
#SBATCH --open-mode=append
#SBATCH --array=1-360

python3 volcano.py $SLURM_ARRAY_TASK_ID

SLURM script to combine frames into gif

Once each frame has been generated, we’ll use ffmpeg to combine our images into a gif and will clean up our workspace. The bash script shown in the next section is what will ensure that this script isn’t run until the array has completed.

Script: create_gif.slurm

#!/bin/bash

#SBATCH --account=hpcteam
#SBATCH --partition=standard
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:10:00
#SBATCH --job-name=make_gif
#SBATCH -o output/slurm_files/%x-%j.out
#SBATCH -e output/slurm_files/%x-%j.err

module load ffmpeg 
ffmpeg -framerate 25 -i $PWD/images/image%d.png -r 30  -b 5000k volcano.mp4
ffmpeg -i volcano.mp4 -loop 0 -vf scale=400:240 volcano.gif
rm volcano.mp4
cd images
DATE_FORMAT=$(date +%m-%d-%Y.%H:%M:%S)
tar czvf volcano-images-${DATE_FORMAT}.tar.gz image*.png
mv *tar.gz ../output/archives
rm -rf ./*.png

Script to automate job submissions

This simple bash script is what implements the SLURM job dependency magic. Each step is described in detail below.

Script: submit-gif-job

#!/bin/bash

printf "Submitting job to generate images\n"
jobid=$(sbatch --parsable generate_frames.slurm)

printf "Job submitted with ID: $jobid\n\n"

printf "Submitting job dependency. Combines images into a gif\n"
sbatch --dependency=afterany:$jobid create_gif.slurm 

Step-by-step:

1) jobid=$(sbatch --parsable generate_frames.slurm)

In this case, we’re capturing the job ID output from our array submission. Typically, when you submit a SLURM job without arguments, you get back something that looks like:

(elgato) [user@wentletrap ~]$ sbatch example.slurm 
Submitted batch job 448243

The parsable option is what reduces this to simply the job ID and allows us to easily capture it:

(elgato) [user@wentletrap ~]$ sbatch --parsable example.slurm 
448244

As a general comment, when you run something like:

VAR=$(command)

You are running command and setting the variable VAR to the output. In the specifc case of our bash script, we’ve set the bash variable jobid to the output of our sbatch --parsable command.

2) sbatch --dependency=afterany:$jobid create_gif.slurm

Now that we have the Job ID, we’ll submit the next job with a dependency flag: --dependency=afterany:$jobid.

The dependency option tells the scheduler that this job should not be run until the job with Job ID $jobid has completed. The afterany specifies that the exit status of the previous job does not matter. Other options are afterok (meaning only execute the dependent job if the previous job ended successfully) or afternotok (meaning only execute if the previous job terminated abnormally, e.g. was cancelled or failed). You might consider setting up multiple job dependencies that depend on the previous job’s exit status.


Submitting the jobs

Once we’ve gotten everything set up, it’s time to execute our workflow. We can check our jobs once we’ve run our bash script. In this case, while the array job used to generate the different image frames is running, the make_gif job will sit in queue with the reason (Dependency) indicating that it is waiting to run until its dependency has been satisfied.

(elgato) [user@wentletrap volcano]$ bash submit-gif-job 
Submitting job to generate images
Job submitted with ID: 447878

Submitting job dependency. Combines images into a gif file
Submitted batch job 447879

(elgato) [user@wentletrap volcano]$ squeue --user user
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
    447878_[9-360]  standard generate     user PD       0:00      1 (None)
            447879  standard make_gif     user PD       0:00      1 (Dependency)
          447878_1  standard generate     user  R       0:02      1 cpu16
          447878_2  standard generate     user  R       0:02      1 cpu37
          447878_3  standard generate     user  R       0:02      1 cpu37
          447878_4  standard generate     user  R       0:02      1 cpu37
          447878_5  standard generate     user  R       0:02      1 cpu37
          447878_6  standard generate     user  R       0:02      1 cpu37
          447878_7  standard generate     user  R       0:02      1 cpu37
          447878_8  standard generate     user  R       0:02      1 cpu37

Once the job has completed, you should see something that looks like the following structure with output files:

(elgato) [user@wentletrap volcano]$ tree
.
├── create_gif.slurm
├── data
│   └── volcano.csv
├── generate_frames.slurm
├── images
├── output
│   ├── archives
│   │   └── volcano-images-10-25-2022.12:52:19.tar.gz
│   ├── gifs
│   │   └── volcano.gif
│   └── slurm_files
│       ├── generate_frames-447878.err
│       ├── generate_frames-447878.out
│       ├── make_gif-447879.err
│       └── make_gif-447879.out
├── submit-gif-job
└── volcano.py

6 directories, 11 files

Output

If everything is successful, there should be a gif of a rotating volcano under ./output/gifs/volcano.gif