Async DAGs in Bash

Recently I had a need to run a bunch of steps in a bash script, with interweaved dependencies for a CI pipeline.

While a simpler approach would do a linear flow like:

Setup Kubernetes cluster.
Deploy dependencies to Kubernetes.
Build image 1.
Build image 2.
Run tests.

could work, its slow; we don't take advantage that we could build our images while the cluster is setting up, for instance.

Essentially what we want is to be able to run a bunch of steps, declare their dependencies, and have them run as soon as their dependencies are met. This is essentially a DAG (directed acyclic graph) execution.

I was surprised that I couldn't find anything doing this, and more surprised that LLMs were terrible at building this (until this ends up in the training data??), so wanted to share what we ended up.

The code

In this example we will setup 5 steps with dependencies as such:

graph TD A[step_a] --> C[step_c] A --> D[step_d] B[step_b] --> D C --> E[step_e] D --> E[step_e]

We need a few pieces to make this work.

First, bash can execute something in the background with &. So we can do something like

echo 'step 1' & PID_1=$!
wait $PID_1 && echo 'step 2'

However, this blocks execution, so we cannot start executing the first available task if there are multiple possibilities.

The next step would be to spawn the step 2 in the background:

echo 'step 1' & PID_1=$!
(wait $PID_1 && echo 'step 2') &

Which would be nice if it worked, but it does not:

wait: pid 1499320 is not a child of this shell

Fortunately, we can abuse tail to do this! The --pid flag will terminate after the specified pid dies, giving this wait replacement:

function await() {
  for pid in "$@"; do
    tail --pid="$pid" -f /dev/null
  done
}

Putting this all together we can build our full DAG:

function await() {
  for pid in "$@"; do
    tail --pid="$pid" -f /dev/null -s .1
  done
}

function step_a() {
  echo "step A starting"
  sleep 1
  echo "step A completed"
}

function step_b() {
  echo "step B starting"
  sleep 1.5
  echo "step B completed"
}

function step_c() {
  echo "step C starting"
  sleep 2
  echo "step C completed"
}

function step_d() {
  echo "step D starting"
  sleep 0.1
  echo "step D completed"
}

function step_e() {
  echo "step E starting"
  sleep 0.1
  echo "step E completed"
}

function main() {
  step_a & PID_A=$!
  step_b & PID_B=$!
  (await $PID_A && step_c) & PID_C=$!
  (await $PID_A $PID_B && step_d) & PID_D=$!
  (await $PID_C $PID_D && step_e) & PID_E=$!
  # Wait each one, not just a raw `wait`, to ensure we fail on errors
  for pid in $(jobs -p); do
    wait $pid
  done
}

main "$@"

Running this (with added timestamps) shows this works as we expect!

$ ./example.sh|& ts '%.T'
16:21:08.192112 step A starting
16:21:08.192148 step B starting
16:21:09.182676 step A completed
16:21:09.682766 step B completed
16:21:10.182644 step C starting
16:21:10.183401 step D starting
16:21:10.284096 step D completed
16:21:12.183604 step C completed
16:21:13.183983 step E starting
16:21:13.284668 step E completed

Though its a bit hard to understand. Combined with shell tracing that I have discussed before, we can get a nice visualization of the DAG execution!

The code#

The code