Files
tuwunel/docker
Jason Volk 71748c3db9 docker: Tweak cargo build verbosities.
docker: Fix unused instructions; format/tabs.

Signed-off-by: Jason Volk <jason@zemos.net>
2025-06-03 06:03:41 +00:00
..
2025-05-01 07:03:51 +00:00

Docker Builder

All Docker images for the project are built here. All images are Docker Bake targets. All targets are leaves and branches of a unified tree leading to a single root. It is a combinatorial matrix of images from shared intermediate layers with a huge ever-growing pulsating cache.

The result is the ability to run continuous integration for a number of build variations while only paying the cost of rebuilding the last layer for each one.

Layout

This directory is made up of three types of files:

  • Shell scripts are the user interface. Use this system through one of the shell scripts. The bake files can still be docker'ed directly but it's recomended to run the script.

  • The .hcl files specify the targets of the tree. This is all standard docker bake. All targets are ordered where each depends on one or more below it. The root of the tree is at the bottom. At the time of this writing there is only one bake.hcl file but this might be broken up; in any case there will always be a single unified tree.

  • The Dockerfile.* files are like "library functions" and provide definition for targets. These are written generically in the style of "template functions" with many variables allowing many targets to create many variations using the same Dockerfile.

Getting started

  1. You will need to install docker buildx/buildkit and maybe a couple other related things. I would appreciate if you could contribute exact commands for your platform when you perform this step.

  2. You will need to create a builder. There are a few complications that must be explained here so please be patient.

    1. Caches are being evicted in ways that I didn't expect, for example, rust is installed in a cache mount which might have been a bad idea. I have disabled GC because an unlucky eviction has massive repercussions. This is my buildkitd config in ~/.config/buildkit/buildkitd.toml
    [worker.oci]
    enabled = true
    rootless = true
    gc = false
    
    [system]
    platformsCacheMaxAge = "504h"
    
    1. Some unsavory options are required for some targets. It might be possible to omit these if you're not building the full tree. Otherwise I've included them in the create command below.

      • To run the complement compliance suite we need the --allow-insecure-entitlement network.host. This requirement is probably a defect in Complement.

    Finally create

    BKD_FLAGS="--allow-insecure-entitlement netwok.host"
    docker buildx create \
    	--name owo \
    	--bootstrap \
    	--driver docker-container \
    	--buildkitd-config ~/.config/buildkit/buildkitd.toml \
    	--buildkitd-flags "$BKD_FLAGS"
    
  3. Build something simple. The usage is ./bake.sh [target] which defaults to building all elements for one vector of the full matrix. You can start smaller though by running docker/bake.sh system which is the root target. You can browse the bake.hcl from the bottom and progressively build targets, or build one or more leaf targets directly. For example try to run a smoketest: ./bake.sh tests-smoke.

  4. Build something more complicated. Set environment variables or just edit the default vectors near the top of in the bake.sh with multiple elements (they are JSON arrays). You can take cues from the primary user of this system, the GitHub CI

  5. Defeat the final boss by building and running complement to completion. This will involve building the targets for complement-tester and complement-testee using bake.sh and then invoking complement.sh. You can take cues again from another user of this in the GitHub CI.

Notes

  • For CI our builders are more persistent rather than being created and destroyed for each invocation. The builder is meant to be safely reused across operations, but for concurrent operations this is tricky, see the next point.

  • For CI our builders are isolated only by actor/user. This will probably change to actor/repo/branch as it's easy to cause issues with concurrent builds right now. However we can't accept destroying the builder after each use, so we'll likely choose actor/repo/branch with the expectation of one build at a time under those constraints. Some external caching might need to be contrived between builders for deduplication but with care such that malicious actors cannot poison data used by other actors, otherwise it defeats the purpose of builder isolation.

On Target Caches

The challenge here is to get all aspects of the target directory perfectly optimal within the many constraints including cargo issues and our goals. This is highly complex because we have to provide each image being built with a safe environment yet share as much as possible between builds. This includes maximum reuse of prior builds but without unnecessary dirtying or more serious unexpected conflicts.

We first create a hausdorff space based on builds which could never benefit from sharing and would always be unsound (even silently) if they came into contact; so all cache id's are prefixed by the matrix components: ${sys_name}/${sys_version}/${rust_target}/${rust_toolchain}/${cargo_profile}. This is important because we can simplify the mount path inside the image which is important for the absolute paths generated by fingerprints and dependency files. We need to keep those the same if any builds expect to share them.

The top-level of target directory is immediately partitioned by cargo into different profiles. Note that we already imposed separation based on profile but we still have to deal with that subdirectory in the mount, which has special-cases for the dev, test, and bench profiles. These directories at their top-level are the final artifact area which is not concurrency safe and there are open issues in cargo for explicit artifact directories. Within these unsafe directories are sub-directories which contain hash-sharded components making them safe again for shared caching, so long as the path to them remains consistent for all images mounting.

As you can see this is already getting very complicated. If this is done wrong lots of different bad things can happen such as broken builds from bad conflicts, constant rebuilds from modest conflicts, or over-use of resources from too much separation.

TODO