As I ramp up coding agent usage, I found myself wanting to share the Rust build cache across multiple copies of the same project (via git worktrees) to avoid multiple-minute cold builds. This was harder than expected, but I was able to get something working.

First, the failed attempts

Attempt 1: just copy the target directory

This seemed like a good idea, but took about 2 minutes which was almost as bad as doing a cold build in the first place. That won't fly.

Attempt 2: CARGO_TARGET_DIR and locking

The obvious answer here is to use a single CARGO_TARGET_DIR that is shared across all copies. Aside from meaning ./target/ no longer has build artifacts (which is a bummer), this runs into the problem of Cargo's locking. Only one build can run at a time, and the last thing I want is some background agent blocking my builds.

This led me to sccache.

Attempt 3: sccache and working directories

sccache seems perfect for this use case as it is designed to be used as a shared cache (locally, or even with remote storage).

However, when setting up multiple copies of my project I found it didn't work at all.

With some debugging, I found that the working directory of the project is computed as part of the cache key, which is now obviously different.

I attempted to workaround this with --remap-path-prefix, which allows rewriting the path (so I could remap each different project to a static /build prefix). However, sscache doesn't handle this argument, so the presence of a different remap-path-prefix flag itself causes the cache key to differ, and thus no cache hits.

I attempted a quick patch to sccache to ignore the remap-path-prefix flag, which seemed to work (on the surface; I admittedly have no clue if this is causing cache hits when it shouldn't).

However, the ergonomics of this made this pretty annoying. I can do export RUSTFLAGS="--remap-path-prefix $(pwd)=/build/" but this means I need to set this in every terminal and IDE. I did try setting this up with direnv, but I felt the overall solution was pretty fragile between the patch to sccache and the environment variable, so I kept looking.

A future --trim-paths may make this a better solution.

One thing I realized from hacking around on sccache was that for most of the build time (building dependencies), the build artifacts are identical across all copies of the project and immutable. This makes them a perfect candidate for hardlinks, which are basically free to create and take no additional disk space.

However, I had to make sure not to hardlink the mutable build artifacts (the artifacts from the current workspace).

Additionally, in order for cargo to get a cache hit, a lot of things need to line up. We need all the artifacts, with the same metadata (timestamps, permissions, etc), as well as the "fingerprint" files that cargo uses to determine if a dependency needs to be rebuilt.

Ultimately I arrived at this python script:

#!/usr/bin/env python3
# Usage: ./cargo-registry-deps.py <src-target-dir> <dest-target-dir>
import json, os, re, subprocess, sys

src, dst = sys.argv[1] + "/debug", sys.argv[2] + "/debug"

meta = json.loads(
    subprocess.check_output(["cargo", "metadata", "--format-version", "1"])
)

registry = set()
for pkg in meta["packages"]:
    s = pkg.get("source") or ""
    if s.startswith("registry+") or s.startswith("git+"):
        registry.add(pkg["name"].replace("-", "_"))
        for target in pkg["targets"]:
            registry.add(target["name"].replace("-", "_"))

re_lib = re.compile(r"^lib")
re_ext = re.compile(r"\.[^.]*$")
re_hash = re.compile(r"-[^-]+$")


def pkg_name(f):
    return re_hash.sub("", re_ext.sub("", re_lib.sub("", f)))


def makedirs(src_dir, dst_dir):
    os.makedirs(dst_dir, exist_ok=True)

def syncdirtime(src_dir, dst_dir):
    st = os.stat(src_dir)
    os.utime(dst_dir, ns=(st.st_atime_ns, st.st_mtime_ns))


def link(src_path, dst_path):
    try:
        os.link(src_path, dst_path)
    except FileExistsError:
        os.unlink(dst_path)
        os.link(src_path, dst_path)


makedirs(f"{src}/deps", f"{dst}/deps")
makedirs(f"{src}/.fingerprint", f"{dst}/.fingerprint")

for f in os.scandir(f"{src}/deps"):
    if pkg_name(f.name) in registry:
        link(f.path, f"{dst}/deps/{f.name}")

for d in os.scandir(f"{src}/.fingerprint"):
    if re_hash.sub("", d.name).replace("-", "_") in registry:
        makedirs(d.path, f"{dst}/.fingerprint/{d.name}")
        for f in os.scandir(d.path):
            link(f.path, f"{dst}/.fingerprint/{d.name}/{f.name}")
        syncdirtime(d.path, f"{dst}/.fingerprint/{d.name}")

for d in os.scandir(f"{src}/build"):
    if re_hash.sub("", d.name).replace("-", "_") in registry:
        for root, dirs, files in os.walk(d.path):
            rel = os.path.relpath(root, f"{src}/build")
            makedirs(root, f"{dst}/build/{rel}")
            for f in files:
                link(f"{root}/{f}", f"{dst}/build/{rel}/{f}")
            syncdirtime(root, f"{dst}/build/{rel}")

Results

Here were the results comparing the methods with my target project (a moderately sized Rust workspace):

Method Time Storage Overhead
Plain Copy 2m19s 127Gb
Selective Copy (Python) 19s 18GB
Hardlink (Python) <1s 0GB

With this, I am able to quickly spin up a copy of my project with a warm build cache, which is a huge quality of life improvement when using agents to work on the same codebase.