- docker-compose.yml: run peer-a and peer-b with TS_USERSPACE=false +
/dev/net/tun device + cap_add. Pin peer-a's WG listen port to 41641
via TS_TAILSCALED_EXTRA_ARGS and publish it to the host so direct
UDP from outside docker has somewhere to land.
- run.sh: use an ephemeral pre-auth key for the test client so
Headscale auto-deletes the test node when its map stream drops
(instead of accumulating hundreds of stale entries that eventually
slow netmap propagation to a crawl). Disable shields-up on both
peers so the kernel firewall doesn't drop inbound tailnet TCP. Tweak
the JSON key extraction to handle pretty-printed output.
- integration.rs: add `test_e2e_tcp_through_tunnel` that brings up
the daemon, dials peer-a's echo server through the proxy, and
asserts the echo body comes back. Currently `#[ignore]`d — the
docker stack runs Headscale over plain HTTP, but Tailscale's client
unconditionally tries TLS to DERP relays ("tls: first record does
not look like a TLS handshake"), so peer-a can never receive
packets we forward via the relay. Unblocking needs either TLS
termination on the docker DERP or running the test inside the same
docker network as peer-a. Test stays in the tree because everything
it tests up to the read timeout is real verified behavior.
A pile of correctness bugs that all stopped real Tailscale peers from
being able to send WireGuard packets back to us. Found while building
out the e2e test against the docker-compose stack.
1. WireGuard static key was wrong (lifecycle.rs)
We were initializing the WgTunnel with `keys.wg_private`, a separate
x25519 key from the one Tailscale advertises in netmaps. Peers know
us by `node_public` and compute mac1 against it; signing handshakes
with a different private key meant every init we sent was silently
dropped. Use `keys.node_private` instead — node_key IS the WG static
key in Tailscale.
2. DERP relay couldn't route packets to us (derp/client.rs)
Our DerpClient was sealing the ClientInfo frame with a fresh
ephemeral NaCl keypair and putting the ephemeral public in the frame
prefix. Tailscale's protocol expects the *long-term* node public key
in the prefix — that's how the relay knows where to forward packets
addressed to our node_key. With the ephemeral key, the relay
accepted the connection but never delivered our peers' responses.
Now seal with the long-term node key.
3. Headscale never persisted our DiscoKey (proto/types.rs, control/*)
The streaming /machine/map handler in Headscale ≥ capVer 68 doesn't
update DiscoKey on the node record — only the "Lite endpoint update"
path does, gated on Stream:false + OmitPeers:true + ReadOnly:false.
Without DiscoKey our nodes appeared in `headscale nodes list` with
`discokey:000…` and never propagated into peer netmaps. Add the
DiscoKey field to RegisterRequest, add OmitPeers/ReadOnly fields to
MapRequest, and call a new `lite_update` between register and the
streaming map. Also add `post_json_no_response` for endpoints that
reply with an empty body.
4. EncapAction is now a struct instead of an enum (wg/tunnel.rs)
Routing was forced to either UDP or DERP. With a peer whose
advertised UDP endpoint is on an unreachable RFC1918 network (e.g.
docker bridge IPs), we'd send via UDP, get nothing, and never fall
back. Send over every available transport — receivers dedupe via
the WireGuard replay window — and let dispatch_encap forward each
populated arm to its respective channel.
5. Drop the dead PacketRouter (wg/router.rs)
Skeleton from an earlier design that never got wired up; it's been
accumulating dead-code warnings.
DERP works for everything but adds relay latency. Add a parallel UDP
transport so peers with reachable endpoints can talk directly:
- wg/tunnel: track each peer's local boringtun index in PeerTunnel and
expose find_peer_by_local_index / find_peer_by_endpoint lookups
- daemon/lifecycle: bind a UdpSocket on 0.0.0.0:0 alongside DERP, run
the recv loop on a clone of an Arc<UdpSocket> so send and recv can
proceed concurrently
- run_wg_loop: new udp_in_rx select arm. For inbound UDP we identify
the source peer by parsing the WireGuard receiver_index out of the
packet header (msg types 2/3/4) and falling back to source-address
matching for type-1 handshake initiations
- dispatch_encap: SendUdp now actually forwards via the UDP channel
UDP failure is non-fatal — DERP can carry traffic alone if the bind
fails or packets are dropped.
Spins up Headscale 0.23 (with embedded DERP) plus two Tailscale peers
in docker compose, generates pre-auth keys, and runs three integration
tests behind the `integration` feature:
- test_register_and_receive_netmap: full TS2021 → register → first
netmap fetch
- test_proxy_listener_accepts: starts the daemon and waits for it to
reach the Running state
- test_daemon_lifecycle: full lifecycle including DERP connect, then
clean shutdown via the DaemonHandle
Run with `sunbeam-net/tests/run.sh` (handles compose up/down + auth
key provisioning) or manually via cargo nextest with the env vars
SUNBEAM_NET_TEST_AUTH_KEY and SUNBEAM_NET_TEST_COORD_URL set.
The daemon orchestrates everything: it owns reconnection backoff, the
WireGuard tunnel, the smoltcp engine, the DERP relay loop, the local
TCP proxy, and a Unix-socket IPC server for status queries.
- daemon/state: DaemonStatus state machine + DaemonHandle for shutdown
signaling and live status access
- daemon/ipc: newline-delimited JSON Unix socket server (Status,
Disconnect, Peers requests)
- daemon/lifecycle: VpnDaemon::start spawns run_daemon_loop, which pins
a session future and selects against shutdown_rx so shutdown breaks
out cleanly. run_session brings up the full pipeline:
control client → register → map stream → wg tunnel → engine →
proxy listener → wg encap/decap loop → DERP relay → IPC server.
DERP transport: when the netmap doesn't surface a usable DERP endpoint
(Headscale's embedded relay returns host_name="headscale", port=0),
fall back to deriving host:port from coordination_url. WG packets to
SendDerp peers go via a dedicated derp_out channel; inbound DERP frames
flow back through derp_in into the decap arm, which forwards Packet
results to the engine and Response results back to derp_out for the
handshake exchange.
- proxy/engine: NetworkEngine that owns the smoltcp VirtualNetwork and
bridges async TCP streams to virtual sockets via a 5ms poll loop.
Each ProxyConnection holds the local TcpStream + smoltcp socket
handle and shuttles data between them with try_read/try_write so the
engine never blocks.
- proxy/tcp: skeleton TcpProxy listener (currently unused; the daemon
inlines its own listener that hands off to the engine via mpsc)
- control/client: TS2021 connection setup — TCP, HTTP CONNECT-style
upgrade to /ts2021, full Noise IK handshake via NoiseStream, then
HTTP/2 client handshake on top via the h2 crate
- control/register: POST /machine/register with pre-auth key, PascalCase
JSON serde matching Tailscale's wire format
- control/netmap: streaming MapStream that reads length-prefixed JSON
messages from POST /machine/map, classifies them into Full/Delta/
PeersChanged/PeersRemoved/KeepAlive, and transparently zstd-decodes
by detecting the 0x28 0xB5 0x2F 0xFD magic (Headscale only compresses
if the client opts in)
- wg/tunnel: per-peer boringtun Tunn management with peer table sync
from netmap (add/remove/update endpoints, allowed_ips, DERP region)
and encapsulate/decapsulate/tick that route to UDP or DERP
- wg/socket: smoltcp Interface backed by an mpsc-channel Device that
bridges sync poll-based smoltcp with async tokio mpsc channels
- wg/router: skeleton PacketRouter (currently unused; reserved for the
unified UDP/DERP ingress path)
DERP is Tailscale's TCP relay protocol for peers that can't establish a
direct UDP path. Add the standalone client:
- derp/framing: 5-byte frame codec (1-byte type + 4-byte BE length)
- derp/client: HTTP /derp upgrade, Tailscale's NaCl SealedBox handshake
(ServerKey → ClientInfo → ServerInfo → NotePreferred), and
send_packet/recv_packet for forwarding WireGuard datagrams
Includes the 8-byte DERP\xf0\x9f\x94\x91 magic prefix in the ServerKey
payload and reads the HTTP upgrade response one byte at a time so the
inline first frame isn't swallowed by a buffered reader.
Tailscale's TS2021 protocol layers HTTP/2 over an encrypted Noise IK
channel reached via HTTP CONNECT-style upgrade. Add the lower half:
- noise/handshake: hand-rolled Noise_IK_25519_ChaChaPoly_BLAKE2s
initiator with HKDF + ChaCha20-Poly1305 (no snow dependency)
- noise/framing: 3-byte frame codec (1-byte type + 2-byte BE length)
- noise/stream: NoiseStream implementing AsyncRead + AsyncWrite over
the framed channel so the h2 crate can sit on top
Add the workspace crate that will host a pure Rust Headscale/Tailscale-
compatible VPN client. This first commit lands the crate skeleton plus
the leaf modules that the rest of the stack builds on:
- error: thiserror Error enum + Result alias
- config: VpnConfig
- keys: Curve25519 node/disco/wg key types with on-disk persistence
- proto/types: PascalCase serde wire types matching Tailscale's JSON
Replace hand-rolled OpenBao HTTP client with vaultrs 0.8.0, which
has official OpenBao support. BaoClient remains the public API so
callers are unchanged. KV patch uses raw HTTP since vaultrs doesn't
expose it yet.
On a clean cluster, the OpenBao pod can't start because it mounts
the openbao-keys secret as a volume, but that secret doesn't exist
until init runs. Create a placeholder secret in WaitPodRunning so
the pod can mount it and start. InitOrUnsealOpenBao overwrites it
with real values during initialization.
The migration from ~/.sunbeam.json to ~/.sunbeam/config.json
copied but never removed the legacy file, which could cause
confusion with older binaries still writing to the old path.
WFE now populates execution pointer step_name from the workflow
definition, so print_summary shows actual step names instead of
"step-0", "step-1", etc.
Move ensure_opensearch_ml and inject_opensearch_model_id out of
cmd_apply post-hooks into dedicated WFE steps that run in a
parallel branch alongside rollout waits. The ML model download
(10+ min on first run) no longer blocks the rest of the pipeline.
The port-forward background task retried infinitely on 500 errors
when the target pod wasn't ready. Add a 30-attempt limit with 2s
backoff between retries so the step eventually fails instead of
spinning forever.
Dispatch `sunbeam up`, `sunbeam seed`, `sunbeam verify`, and
`sunbeam bootstrap` through WFE workflows instead of monolithic
functions. Steps communicate via JSON workflow data and each
workflow is persisted in a per-context SQLite database.
- `sunbeam auth token` prints JSON headers for MCP headersHelper:
{"Authorization": "Bearer <token>"}
- Add penpot to PG_USERS, pg_db_map, KV seed, and all_paths
- Add cert-manager to VSO auth role bound namespaces
Reuse any existing model version (including DEPLOY_FAILED) instead of
registering a new copy. Prevents accumulation of stale model chunks
in .plugins-ml-model when OpenSearch restarts between applies.
Add rand_alphanum() using OsRng for generating fixed-length
alphanumeric secrets. Seed secrets-cipher (32 chars) into the
kratos KV path for at-rest encryption of OIDC tokens.
- bao: replaced by `sunbeam vault` with proper JWT auth
- docs: La Suite Docs not ready for production
- people: La Suite People not ready for production
- DynamicBearer AuthMethod: La Suite clients resolve tokens fresh
per-request from cache file, surviving token expiry mid-session
- Retry with exponential backoff on all Drive API calls (create_child,
upload_ended) — up to 5 retries on 429/500/502/503
- Token refresh triggered on 500 before retry (handles expired SSO)
- S3 upload retry with backoff (up to 3 retries on 502/503)
- Connection pooling: reuse DriveClient HTTP client for S3 PUTs
- Folder/file dedup: skip existing items on re-upload
- Overall bar progress based on file count (was bytes, causing 50%
bar at low file count when large files uploaded first)
- Bandwidth computed manually from completed bytes / elapsed time
- Per-file bars show spinner + name only (no misleading 0 B counter)
- S3 upload retries up to 3x on 502/503 with backoff
- Folder dedup: list_children before create, reuse existing folders
- File dedup: skip files already present in target folder
- Connection pooling: reuse DriveClient's HTTP client for S3 PUTs
- Default parallel back to 8 (retries handle transient 502s)
- Parallel file uploads with --parallel flag (default 4)
- indicatif MultiProgress: overall bar with file count, speed, ETA
- Per-file spinner bars showing filename during upload
- Phase 1: walk tree + create folders sequentially
- Phase 2: upload files concurrently via semaphore
- Summary line on completion (files, bytes, time, speed)
- Fixed DriveFile/DriveFolder types to match actual API fields
- DriveClient now Clone for Arc sharing across tasks
os_api: resolve pod name by label instead of hardcoded opensearch-0.
added find_pod_by_label helper to kube.rs.
secrets.py: sol-agent policy (read/write sol-tokens/*) and k8s auth
role bound to matrix namespace default SA.
SunbeamClient accessors are now async and resolve auth per-client:
- SSO bearer (get_token) for admin APIs, Matrix, La Suite, OpenSearch
- Gitea PAT (get_gitea_token) for VCS
- None for Prometheus, Loki, S3, LiveKit
Fixes client URLs to match deployed routes: hydra→hydra.{domain},
matrix→messages.{domain}, grafana→metrics.{domain},
prometheus→systemmetrics.{domain}, loki→systemlogs.{domain}.
Removes all ad-hoc token helpers from CLI modules (matrix_with_token,
os_client, people_client, etc). Every dispatch just calls
client.service().await?.
Patches gitea admin credentials into secret/sol for Sol's Gitea
integration. Adds sol-agent vault policy with read/write access
to sol-tokens/* for user impersonation PATs, plus k8s auth role
bound to the matrix namespace.
Adds Verb variants: auth, vcs, chat, search, storage, media, mon,
vault, people, docs, meet, drive, mail, cal, find. Each delegates
to the corresponding SDK cli.rs dispatch function.
Removes the legacy `user` command (replaced by `auth identity`).
Renames Get's -o to --kubectl-output to avoid conflict with the
new global -o/--output flag. Enables all SDK features in binary.