Production Headscale terminates TLS for both the control plane (via the
TS2021 HTTP CONNECT upgrade endpoint) and the embedded DERP relay.
Without TLS, the daemon could only talk to plain-HTTP test stacks.
- New crate::tls module: shared TlsMode (Verify | InsecureSkipVerify)
+ tls_wrap helper. webpki roots in Verify mode; an explicit
ServerCertVerifier that accepts any cert in InsecureSkipVerify
(test-only).
- Cargo.toml: add tokio-rustls, webpki-roots, rustls-pemfile.
- noise/handshake: perform_handshake is now generic over the underlying
stream and takes an explicit `host_header` argument instead of using
`peer_addr`. Lets callers pass either a TcpStream or a TLS-wrapped
stream.
- noise/stream: NoiseStream<S> is generic over the underlying transport
with `S = TcpStream` as the default. The AsyncRead+AsyncWrite impls
forward to whatever S provides.
- control/client: ControlClient::connect detects `https://` in
coordination_url and TLS-wraps the TCP stream before the Noise
handshake. fetch_server_key now also TLS-wraps when needed. Both
honor the new derp_tls_insecure config flag (which is misnamed but
controls all TLS verification, not just DERP).
- derp/client: DerpClient::connect_with_tls accepts a TlsMode and uses
the shared tls::tls_wrap helper instead of duplicating it. The
client struct's inner Framed is now generic over a Box<dyn
DerpTransport> so it can hold either a plain or TLS-wrapped stream.
- daemon/lifecycle: derive the DERP URL scheme from coordination_url
(https → https) and pass derp_tls_insecure through.
- config.rs: new `derp_tls_insecure: bool` field on VpnConfig.
- src/vpn_cmds.rs: pass `derp_tls_insecure: false` for production.
Two bug fixes found while wiring this up:
- proxy/engine: bridge_connection used to set remote_done on any
smoltcp recv error, including the transient InvalidState that
smoltcp returns while a TCP socket is still in SynSent. That meant
the engine gave up on the connection before the WG handshake even
finished. Distinguish "not ready yet" (returns Ok(0)) from
"actually closed" (returns Err) inside tcp_recv, and only mark
remote_done on the latter.
- proxy/engine: the connection's "done" condition required
local_read_done, but most clients (curl, kubectl) keep their write
side open until they read EOF. The engine never closed its local
TCP, so clients sat in read_to_end forever. Drop the connection as
soon as the remote side has finished and we've drained its buffer
to the local socket — the local TcpStream drop closes the socket
and the client sees EOF.
A pile of correctness bugs that all stopped real Tailscale peers from
being able to send WireGuard packets back to us. Found while building
out the e2e test against the docker-compose stack.
1. WireGuard static key was wrong (lifecycle.rs)
We were initializing the WgTunnel with `keys.wg_private`, a separate
x25519 key from the one Tailscale advertises in netmaps. Peers know
us by `node_public` and compute mac1 against it; signing handshakes
with a different private key meant every init we sent was silently
dropped. Use `keys.node_private` instead — node_key IS the WG static
key in Tailscale.
2. DERP relay couldn't route packets to us (derp/client.rs)
Our DerpClient was sealing the ClientInfo frame with a fresh
ephemeral NaCl keypair and putting the ephemeral public in the frame
prefix. Tailscale's protocol expects the *long-term* node public key
in the prefix — that's how the relay knows where to forward packets
addressed to our node_key. With the ephemeral key, the relay
accepted the connection but never delivered our peers' responses.
Now seal with the long-term node key.
3. Headscale never persisted our DiscoKey (proto/types.rs, control/*)
The streaming /machine/map handler in Headscale ≥ capVer 68 doesn't
update DiscoKey on the node record — only the "Lite endpoint update"
path does, gated on Stream:false + OmitPeers:true + ReadOnly:false.
Without DiscoKey our nodes appeared in `headscale nodes list` with
`discokey:000…` and never propagated into peer netmaps. Add the
DiscoKey field to RegisterRequest, add OmitPeers/ReadOnly fields to
MapRequest, and call a new `lite_update` between register and the
streaming map. Also add `post_json_no_response` for endpoints that
reply with an empty body.
4. EncapAction is now a struct instead of an enum (wg/tunnel.rs)
Routing was forced to either UDP or DERP. With a peer whose
advertised UDP endpoint is on an unreachable RFC1918 network (e.g.
docker bridge IPs), we'd send via UDP, get nothing, and never fall
back. Send over every available transport — receivers dedupe via
the WireGuard replay window — and let dispatch_encap forward each
populated arm to its respective channel.
5. Drop the dead PacketRouter (wg/router.rs)
Skeleton from an earlier design that never got wired up; it's been
accumulating dead-code warnings.
DERP is Tailscale's TCP relay protocol for peers that can't establish a
direct UDP path. Add the standalone client:
- derp/framing: 5-byte frame codec (1-byte type + 4-byte BE length)
- derp/client: HTTP /derp upgrade, Tailscale's NaCl SealedBox handshake
(ServerKey → ClientInfo → ServerInfo → NotePreferred), and
send_packet/recv_packet for forwarding WireGuard datagrams
Includes the 8-byte DERP\xf0\x9f\x94\x91 magic prefix in the ServerKey
payload and reads the HTTP upgrade response one byte at a time so the
inline first frame isn't swallowed by a buffered reader.