Production Headscale terminates TLS for both the control plane (via the
TS2021 HTTP CONNECT upgrade endpoint) and the embedded DERP relay.
Without TLS, the daemon could only talk to plain-HTTP test stacks.
- New crate::tls module: shared TlsMode (Verify | InsecureSkipVerify)
+ tls_wrap helper. webpki roots in Verify mode; an explicit
ServerCertVerifier that accepts any cert in InsecureSkipVerify
(test-only).
- Cargo.toml: add tokio-rustls, webpki-roots, rustls-pemfile.
- noise/handshake: perform_handshake is now generic over the underlying
stream and takes an explicit `host_header` argument instead of using
`peer_addr`. Lets callers pass either a TcpStream or a TLS-wrapped
stream.
- noise/stream: NoiseStream<S> is generic over the underlying transport
with `S = TcpStream` as the default. The AsyncRead+AsyncWrite impls
forward to whatever S provides.
- control/client: ControlClient::connect detects `https://` in
coordination_url and TLS-wraps the TCP stream before the Noise
handshake. fetch_server_key now also TLS-wraps when needed. Both
honor the new derp_tls_insecure config flag (which is misnamed but
controls all TLS verification, not just DERP).
- derp/client: DerpClient::connect_with_tls accepts a TlsMode and uses
the shared tls::tls_wrap helper instead of duplicating it. The
client struct's inner Framed is now generic over a Box<dyn
DerpTransport> so it can hold either a plain or TLS-wrapped stream.
- daemon/lifecycle: derive the DERP URL scheme from coordination_url
(https → https) and pass derp_tls_insecure through.
- config.rs: new `derp_tls_insecure: bool` field on VpnConfig.
- src/vpn_cmds.rs: pass `derp_tls_insecure: false` for production.
Two bug fixes found while wiring this up:
- proxy/engine: bridge_connection used to set remote_done on any
smoltcp recv error, including the transient InvalidState that
smoltcp returns while a TCP socket is still in SynSent. That meant
the engine gave up on the connection before the WG handshake even
finished. Distinguish "not ready yet" (returns Ok(0)) from
"actually closed" (returns Err) inside tcp_recv, and only mark
remote_done on the latter.
- proxy/engine: the connection's "done" condition required
local_read_done, but most clients (curl, kubectl) keep their write
side open until they read EOF. The engine never closed its local
TCP, so clients sat in read_to_end forever. Drop the connection as
soon as the remote side has finished and we've drained its buffer
to the local socket — the local TcpStream drop closes the socket
and the client sees EOF.
Adds an optional `cluster_api_host` field to VpnConfig. When set, the
daemon resolves it against the netmap's peer list once the first
netmap arrives and uses that peer's tailnet IP as the proxy backend,
overriding the static `cluster_api_addr`. Falls back to the static
addr if the hostname doesn't match any peer.
The resolver tries hostname first, then peer name (FQDN), then a
prefix match against name. Picks v4 over v6 from the peer's address
list.
- sunbeam-net/src/config.rs: new `cluster_api_host: Option<String>`
- sunbeam-net/src/daemon/lifecycle.rs: resolve_peer_ip helper +
resolution at proxy bind time
- sunbeam-net/tests/integration.rs: pass cluster_api_host: None in
the existing VpnConfig literals
- src/config.rs: new context field `vpn-cluster-host`
- src/vpn_cmds.rs: thread it from context → VpnConfig
`sunbeam connect` now fork-execs itself with a hidden `__vpn-daemon`
subcommand instead of running the daemon in-process. The user-facing
command spawns the child detached (stdio → log file, setsid for no
controlling TTY), polls the IPC socket until the daemon reaches
Running, prints a one-line status, and exits. The user gets back to
their shell immediately.
- src/cli.rs: `Connect { foreground }` instead of unit. Add hidden
`__vpn-daemon` Verb that the spawned child runs.
- src/vpn_cmds.rs: split into spawn_background_daemon (default path)
and run_daemon_foreground (used by both `connect --foreground` and
`__vpn-daemon`). Detached child uses pre_exec(setsid) and inherits
--context from the parent so it resolves the same VPN config.
Refuses to start if a daemon is already running on the control
socket; cleans up stale socket files. Switches the proxy bind from
16443 (sienna's existing SSH tunnel uses it) to 16579.
- sunbeam-net/src/daemon/lifecycle: add a SocketGuard RAII type so the
IPC control socket is unlinked when the daemon exits, regardless of
shutdown path. Otherwise `vpn status` after a clean disconnect would
see a stale socket and report an error.
End-to-end smoke test against the docker stack:
$ sunbeam connect
==> VPN daemon spawned (pid 90072, ...)
Connected (100.64.0.154, fd7a:115c:a1e0::9a) — 2 peers visible
$ sunbeam vpn status
VPN: running
addresses: 100.64.0.154, fd7a:115c:a1e0::9a
peers: 2
derp home: region 0
$ sunbeam disconnect
==> Asking VPN daemon to stop...
Daemon acknowledged shutdown.
$ sunbeam vpn status
VPN: not running
DaemonHandle's shutdown_tx (oneshot) is replaced with a CancellationToken
shared between the daemon loop and the IPC server. The token is the
single source of truth for "should we shut down" — `DaemonHandle::shutdown`
cancels it, and an IPC `Stop` request also cancels it.
- daemon/state: store the CancellationToken on DaemonHandle and clone it
on Clone (so cached IPC handles can still trigger shutdown).
- daemon/ipc: IpcServer takes a daemon_shutdown token; `Stop` now cancels
it instead of returning Ok and doing nothing. Add IpcClient with
`request`, `status`, and `stop` methods so the CLI can drive a
backgrounded daemon over the Unix socket.
- daemon/lifecycle: thread the token through run_daemon_loop and
run_session, pass a clone to IpcServer::new.
- lib.rs: re-export IpcClient/IpcCommand/IpcResponse so callers don't
have to reach into the daemon module.
- src/vpn_cmds.rs: `sunbeam disconnect` now actually talks to the daemon
via IpcClient::stop, and `sunbeam vpn status` queries IpcClient::status
and prints addresses + peer count + DERP home.
Adds the foreground VPN client commands. The daemon runs in-process
inside the CLI for the lifetime of `sunbeam connect` — no separate
background daemon yet, that can come later if needed.
- Cargo.toml: add sunbeam-net as a workspace dep, plus hostname/whoami
for building a per-machine netmap label like "sienna@laptop"
- src/config.rs: new `vpn-url` and `vpn-auth-key` fields on Context
- src/cli.rs: `Connect`, `Disconnect`, and `Vpn { Status }` verbs
- src/vpn_cmds.rs: command handlers
- cmd_connect reads VPN config from the active context, starts the
daemon at ~/.sunbeam/vpn, polls for Running, then blocks on ^C
before calling DaemonHandle::shutdown
- cmd_disconnect / cmd_vpn_status are placeholders that report based
on the control socket; actually talking to a backgrounded daemon
needs an IPC client (not yet exposed from sunbeam-net)
- src/workflows/mod.rs: `..Default::default()` on Context literals so
the new fields don't break the existing tests
Replace hand-rolled OpenBao HTTP client with vaultrs 0.8.0, which
has official OpenBao support. BaoClient remains the public API so
callers are unchanged. KV patch uses raw HTTP since vaultrs doesn't
expose it yet.
On a clean cluster, the OpenBao pod can't start because it mounts
the openbao-keys secret as a volume, but that secret doesn't exist
until init runs. Create a placeholder secret in WaitPodRunning so
the pod can mount it and start. InitOrUnsealOpenBao overwrites it
with real values during initialization.
The migration from ~/.sunbeam.json to ~/.sunbeam/config.json
copied but never removed the legacy file, which could cause
confusion with older binaries still writing to the old path.
Move ensure_opensearch_ml and inject_opensearch_model_id out of
cmd_apply post-hooks into dedicated WFE steps that run in a
parallel branch alongside rollout waits. The ML model download
(10+ min on first run) no longer blocks the rest of the pipeline.
The port-forward background task retried infinitely on 500 errors
when the target pod wasn't ready. Add a 30-attempt limit with 2s
backoff between retries so the step eventually fails instead of
spinning forever.
Dispatch `sunbeam up`, `sunbeam seed`, `sunbeam verify`, and
`sunbeam bootstrap` through WFE workflows instead of monolithic
functions. Steps communicate via JSON workflow data and each
workflow is persisted in a per-context SQLite database.
- `sunbeam auth token` prints JSON headers for MCP headersHelper:
{"Authorization": "Bearer <token>"}
- Add penpot to PG_USERS, pg_db_map, KV seed, and all_paths
- Add cert-manager to VSO auth role bound namespaces
os_api: resolve pod name by label instead of hardcoded opensearch-0.
added find_pod_by_label helper to kube.rs.
secrets.py: sol-agent policy (read/write sol-tokens/*) and k8s auth
role bound to matrix namespace default SA.
forge_url() now checks active context domain first before falling back
to production_host. Bare IP addresses are skipped in the host heuristic.
Adds .cargo/config.toml for the sunbeam Gitea Cargo registry.
- Add --no-cache flag to sunbeam build (passes --no-cache to buildctl)
- Add Sol (virtual librarian) as a build target
- Wire no_cache through all build functions and dispatch
Onboarding now provisions app-level accounts:
- create_mailbox: Django ORM via kubectl exec into messages-backend
- setup_projects_user: knex.js via kubectl exec into projects pod
- Welcome email includes job title and department when provided
Offboarding cleans up:
- delete_mailbox: removes mailbox + Django user
- cleanup_projects_user: soft-deletes Planka user + memberships
All provisioning is best-effort (warns on failure, doesn't block).
Planka:
- Board discovery via GET /api/projects (no hardcoded IDs)
- String IDs (snowflake) throughout — TicketRef::Planka holds String
- Create auto-discovers first board/list, or matches --target by name
- Close finds "Done"/"Closed" list and moves card automatically
- Assign resolves users via search, supports "me" for self-assign
- Ticket IDs use p:/g: short prefixes
Gitea:
- Assign uses PATCH on issue (not POST /assignees which needs collaborator)
- Create requires --target org/repo
All pm subcommands tested against live Planka + Gitea instances.
Context resolution: --context flag > current-context from config > "local".
No more production/local distinction in the CLI flags — the context
determines everything (domain, kube-context, ssh-host, infra-dir).
Remove Env enum entirely. Production detection is now "context has ssh-host".
Config now supports named contexts (like kubectl), each bundling
domain, kube-context, ssh-host, infra-dir, and acme-email. Legacy
flat config auto-migrates to a "production" context on load.
- sunbeam config set --domain sunbeam.pt --host user@server
- sunbeam config use-context production
- sunbeam config get (shows all contexts)
Auth tokens stored per-domain (~/.local/share/sunbeam/auth/{domain}.json)
so local and production don't clobber each other. pm and auth commands
read domain from active context instead of K8s cluster discovery.
- 5-minute timeout on callback wait (Ctrl+C now works)
- Skip K8s client_id lookup when no cluster configured (removes noisy ERROR)
- Center the success page HTML to match Sunbeam Studios branding
New src/pm.rs module with sunbeam pm subcommand:
- Planka client: cards, boards, lists, comments, assignments
via OIDC token exchange for Planka JWT
- Gitea client: issues, comments, labels, milestones
via OAuth2 Bearer token
- Unified Ticket type with p:/g: ID prefixes
- pm list: parallel fetch from both sources, merged display
- pm show/create/comment/close/assign across both systems
- Auth via crate::auth::get_token() (Hydra OAuth2)
- set-password reads from stdin when password arg omitted
- Port-forward proxy retries on pod restart instead of failing
- cmd_seed acquires PID-based advisory lockfile to prevent concurrent runs
Refactor s3_auth_headers into deterministic s3_auth_headers_at that
accepts a timestamp. Add test with AWS example credentials and fixed
date verifying canonical request, string-to-sign, and final signature.
Replace all blocking I/O with async equivalents:
- tokio::process::Command instead of std::process::Command
- tokio::time::sleep instead of std::thread::sleep
- reqwest::Client (async) instead of reqwest::blocking::Client
- All helper functions (api, find_identity, generate_recovery, etc.) now async
- PortForward::Drop uses start_kill() (sync SIGKILL) for cleanup
- send_welcome_email wrapped in spawn_blocking for lettre sync transport
- Check CNPG Cluster CRD status.phase instead of pod Running phase
- DKIM public key: use SPKI format (BEGIN PUBLIC KEY) matching Python
- Use kv_patch instead of kv_put for dirty paths (preserves external fields)
- Vault KV only written when password is newly generated
- Gitea exec passes container name Some("gitea")
- Fix openbao comment (400 not 409)
- Store SSH tunnel child in static Mutex (was dropped immediately)
- cmd_bao: use env(1) for VAULT_TOKEN instead of sh -c (no shell injection)
- Cache API discovery across kube_apply documents (was per-doc roundtrip)
- Replace blocking ToSocketAddrs with tokio::net::lookup_host
- Remove double YAML->JSON->string->JSON serialization in kube_apply
- ResultExt::ctx now preserves all SunbeamError variants
- New src/constants.rs: single source for MANAGED_NS (includes monitoring)
and GITEA_ADMIN_USER, imported by all modules that previously had copies
- Fix checks.rs reading wrong key names from gitea-admin-credentials secret
- Add VaultStaticSecret pruning in pre_apply_cleanup (H1)
- Fix cert_manager_present check (was always true after canonicalize)
- Add warnings for silent failures in pre_apply_cleanup
- Fix os_api dead variable assignment
- Set TLS private key permissions to 0600
- Redact Gitea admin password in print_urls
Full cmd_seed implementation using openbao::BaoClient:
- OpenBao init/unseal via HTTP API (no kubectl exec)
- KV v2 seeding with get_or_create pattern and dirty-path tracking
- Kubernetes auth method + VSO policy configuration
- Database secrets engine with vault PG user and static roles
- DKIM key generation via rsa + pkcs8 crates
- Kratos admin identity seeding via port-forward + reqwest
cmd_verify: VSO E2E test with test sentinel, sync poll, cleanup.
Replace anyhow::{bail, Context, Result} with crate::error::{Result,
SunbeamError, ResultExt} across all modules. Each module uses the
appropriate error variant (Kube, Secrets, Build, Identity, etc).
SunbeamError enum with typed variants (Kube, Config, Network, Secrets,
Build, Identity, ExternalTool, Io, Json, Yaml, Other) each mapping to
a process exit code. ResultExt trait replaces anyhow's .context().
main.rs initializes tracing-subscriber with RUST_LOG env filter and
routes all errors to exit codes via SunbeamError::exit_code().
Removes anyhow dependency.
services.rs:
- Pod status with unicode icons, grouped by namespace
- VSO sync status (VaultStaticSecret/VaultDynamicSecret via kube-rs DynamicObject)
- Log streaming via kube-rs log_stream + futures::AsyncBufReadExt
- Pod get in YAML/JSON format
- Rollout restart with namespace/service filtering
checks.rs:
- 11 health check functions (gitea, postgres, valkey, openbao, seaweedfs, kratos, hydra, people, livekit)
- AWS4-HMAC-SHA256 S3 auth header generation using sha2 + hmac
- Concurrent execution via tokio JoinSet
- mkcert root CA trust for local TLS
secrets.rs:
- Stub with cmd_seed/cmd_verify (requires live cluster for full impl)
users.rs:
- All 10 Kratos identity operations via reqwest + kubectl port-forward
- Welcome email via lettre SMTP through port-forwarded postfix
- Employee onboarding with auto-assigned ID, HR metadata
- Offboarding with Kratos + Hydra session revocation
gitea.rs:
- Bootstrap without Lima VM: admin password, org creation, OIDC auth source
- Gitea API via kubectl exec curl
images.rs:
- BuildEnv detection, buildctl build + push via port-forward
- Per-service builders for all 17 build targets
- Deploy rollout, node image pull, uv Dockerfile patching
- Mirror scaffolding (containerd operations marked TODO)
cluster.rs:
- Pure K8s cmd_up: cert-manager, linkerd, rcgen TLS certs, core service wait
- No Lima VM operations
manifests.rs:
- Full cmd_apply: kustomize build, two-pass convergence, ConfigMap restart detection
- Pre-apply cleanup, webhook wait, mkcert CA, tuwunel OAuth2 redirect patch
Test coverage: 142 tests across 14 modules (44 in checks, 27 in cli, 13 in images, 12 in tools, 12 in services, 11 in users, 10 in manifests, 9 in kube, 9 in cluster, 7 in update, 6 in gitea, 4 in openbao, 3 in output, 2 in config).
Phase 0 of Python-to-Rust CLI rewrite:
- Cargo.toml with all dependencies (kube-rs, reqwest, russh, rcgen, lettre, etc.)
- build.rs: downloads kustomize v5.8.1 + helm v4.1.0 at compile time, embeds as bytes, sets SUNBEAM_COMMIT from git
- src/main.rs: tokio main with anyhow error formatting
- src/cli.rs: full clap derive struct tree matching all Python argparse subcommands
- src/config.rs: SunbeamConfig serde struct, load/save ~/.sunbeam.json
- src/output.rs: step/ok/warn/table with exact Python format strings
- src/tools.rs: embedded kustomize+helm extraction to cache dir
- src/kube.rs: parse_target, domain_replace, context management
- src/manifests.rs: filter_by_namespace with full test coverage
- Stub modules for all remaining features (cluster, secrets, images, services, checks, gitea, users, update)
23 tests pass, cargo check clean.