Commit Graph

3 Commits

Author SHA1 Message Date
2624a13952 feat(net): TLS support for HTTPS coordination URLs and DERP relays
Production Headscale terminates TLS for both the control plane (via the
TS2021 HTTP CONNECT upgrade endpoint) and the embedded DERP relay.
Without TLS, the daemon could only talk to plain-HTTP test stacks.

- New crate::tls module: shared TlsMode (Verify | InsecureSkipVerify)
  + tls_wrap helper. webpki roots in Verify mode; an explicit
  ServerCertVerifier that accepts any cert in InsecureSkipVerify
  (test-only).
- Cargo.toml: add tokio-rustls, webpki-roots, rustls-pemfile.
- noise/handshake: perform_handshake is now generic over the underlying
  stream and takes an explicit `host_header` argument instead of using
  `peer_addr`. Lets callers pass either a TcpStream or a TLS-wrapped
  stream.
- noise/stream: NoiseStream<S> is generic over the underlying transport
  with `S = TcpStream` as the default. The AsyncRead+AsyncWrite impls
  forward to whatever S provides.
- control/client: ControlClient::connect detects `https://` in
  coordination_url and TLS-wraps the TCP stream before the Noise
  handshake. fetch_server_key now also TLS-wraps when needed. Both
  honor the new derp_tls_insecure config flag (which is misnamed but
  controls all TLS verification, not just DERP).
- derp/client: DerpClient::connect_with_tls accepts a TlsMode and uses
  the shared tls::tls_wrap helper instead of duplicating it. The
  client struct's inner Framed is now generic over a Box<dyn
  DerpTransport> so it can hold either a plain or TLS-wrapped stream.
- daemon/lifecycle: derive the DERP URL scheme from coordination_url
  (https → https) and pass derp_tls_insecure through.
- config.rs: new `derp_tls_insecure: bool` field on VpnConfig.
- src/vpn_cmds.rs: pass `derp_tls_insecure: false` for production.

Two bug fixes found while wiring this up:

- proxy/engine: bridge_connection used to set remote_done on any
  smoltcp recv error, including the transient InvalidState that
  smoltcp returns while a TCP socket is still in SynSent. That meant
  the engine gave up on the connection before the WG handshake even
  finished. Distinguish "not ready yet" (returns Ok(0)) from
  "actually closed" (returns Err) inside tcp_recv, and only mark
  remote_done on the latter.
- proxy/engine: the connection's "done" condition required
  local_read_done, but most clients (curl, kubectl) keep their write
  side open until they read EOF. The engine never closed its local
  TCP, so clients sat in read_to_end forever. Drop the connection as
  soon as the remote side has finished and we've drained its buffer
  to the local socket — the local TcpStream drop closes the socket
  and the client sees EOF.
2026-04-07 15:28:44 +01:00
dca8c3b643 fix(net): protocol fixes for Tailscale-compatible peer reachability
A pile of correctness bugs that all stopped real Tailscale peers from
being able to send WireGuard packets back to us. Found while building
out the e2e test against the docker-compose stack.

1. WireGuard static key was wrong (lifecycle.rs)
   We were initializing the WgTunnel with `keys.wg_private`, a separate
   x25519 key from the one Tailscale advertises in netmaps. Peers know
   us by `node_public` and compute mac1 against it; signing handshakes
   with a different private key meant every init we sent was silently
   dropped. Use `keys.node_private` instead — node_key IS the WG static
   key in Tailscale.

2. DERP relay couldn't route packets to us (derp/client.rs)
   Our DerpClient was sealing the ClientInfo frame with a fresh
   ephemeral NaCl keypair and putting the ephemeral public in the frame
   prefix. Tailscale's protocol expects the *long-term* node public key
   in the prefix — that's how the relay knows where to forward packets
   addressed to our node_key. With the ephemeral key, the relay
   accepted the connection but never delivered our peers' responses.
   Now seal with the long-term node key.

3. Headscale never persisted our DiscoKey (proto/types.rs, control/*)
   The streaming /machine/map handler in Headscale ≥ capVer 68 doesn't
   update DiscoKey on the node record — only the "Lite endpoint update"
   path does, gated on Stream:false + OmitPeers:true + ReadOnly:false.
   Without DiscoKey our nodes appeared in `headscale nodes list` with
   `discokey:000…` and never propagated into peer netmaps. Add the
   DiscoKey field to RegisterRequest, add OmitPeers/ReadOnly fields to
   MapRequest, and call a new `lite_update` between register and the
   streaming map. Also add `post_json_no_response` for endpoints that
   reply with an empty body.

4. EncapAction is now a struct instead of an enum (wg/tunnel.rs)
   Routing was forced to either UDP or DERP. With a peer whose
   advertised UDP endpoint is on an unreachable RFC1918 network (e.g.
   docker bridge IPs), we'd send via UDP, get nothing, and never fall
   back. Send over every available transport — receivers dedupe via
   the WireGuard replay window — and let dispatch_encap forward each
   populated arm to its respective channel.

5. Drop the dead PacketRouter (wg/router.rs)
   Skeleton from an earlier design that never got wired up; it's been
   accumulating dead-code warnings.
2026-04-07 14:33:43 +01:00
76ab2c1a8e feat(net): add DERP relay client
DERP is Tailscale's TCP relay protocol for peers that can't establish a
direct UDP path. Add the standalone client:

- derp/framing: 5-byte frame codec (1-byte type + 4-byte BE length)
- derp/client: HTTP /derp upgrade, Tailscale's NaCl SealedBox handshake
  (ServerKey → ClientInfo → ServerInfo → NotePreferred), and
  send_packet/recv_packet for forwarding WireGuard datagrams

Includes the 8-byte DERP\xf0\x9f\x94\x91 magic prefix in the ServerKey
payload and reads the HTTP upgrade response one byte at a time so the
inline first frame isn't swallowed by a buffered reader.
2026-04-07 13:41:17 +01:00