UA Research Computing

HPC Examples

Array Job With Text Filenames

What problem does this help fix?

If you want to run multiple jobs where each opens a different file to analyze but the naming scheme isn’t conducive to automating the process using simple array indices (i.e., 1.txt,2.txt,…)

Example

Submission Script

#!/bin/bash
#SBATCH --job-name=Array-Read-Filenames
#SBATCH --ntasks=1
#SBATCH --nodes=1             
#SBATCH --time=00:01:00   
#SBATCH --partition=standard
#SBATCH --account=YOUR_GROUP
#SBATCH --array=1-4

CurrentFile="$( sed "${SLURM_ARRAY_TASK_ID}q;d" InputFiles )"
echo "JOB NAME: $SLURM_JOB_NAME, JOB ID: $SLURM_JOB_ID, EXAMPLE COMMAND: ./executable -o output${SLURM_ARRAY_TASK_ID} ${CurrentFile}"

Input File

For this example, you’ll want to have a file called InputFiles in your working directory. This will contain one filename per line. Contents:

SRR2309587.fastq
SRR3050489.fastq
SRR305356.fastq
SRR305p0982.fastq

Script Breakdown

For each of the four subjobs, we’ll make use of SLURM_ARRAY_TASK_ID to pull the line number (line numbers 1 to 4) from InputFiles:

CurrentFile="$( sed "${SLURM_ARRAY_TASK_ID}q;d" InputFiles )"

We will print a sample command that includes our filename to verify that everything is working as expected for demonstration purposes:

echo "JOB NAME: $SLURM_JOB_NAME, JOB ID: $SLURM_JOB_ID, EXAMPLE COMMAND: ./executable -o output${SLURM_ARRAY_TASK_ID} ${CurrentFile}"

To generate your own InputFile, you can either manually add your filenames or can automate the process, for example if you have all your files in a single location:

$ ls *fastq > InputFiles

Script Submission Command:

(puma) [netid@junonia ~]$ sbatch Array-Read-Filenames.slurm 
Submitted batch job 1694071

Output Files

Each of the subjobs in the array will output its own file of the form slurm-<job_id>_<array_id>.out as seen below:

(puma) [netid@junonia ~]$ ls *.out
slurm-1694071_1.out  slurm-1694071_2.out  slurm-1694071_3.out
slurm-1694071_4.out

File Contents:

(puma) [netid@junonia ~]$ cat *.out | grep fastq
JOB NAME: Array-Read-Filenames, JOB ID: 1694072, EXAMPLE COMMAND: ./executable -o output1 SRR2309587.fastq
JOB NAME: Array-Read-Filenames, JOB ID: 1694073, EXAMPLE COMMAND: ./executable -o output2 SRR3050489.fastq
JOB NAME: Array-Read-Filenames, JOB ID: 1694074, EXAMPLE COMMAND: ./executable -o output3 SRR305356.fastq
JOB NAME: Array-Read-Filenames, JOB ID: 1694071, EXAMPLE COMMAND: ./executable -o output4 SRR305p0982.fastq