From 0baab921414e048f8a3033072494f47f32a0c716 Mon Sep 17 00:00:00 2001 From: Sienna Meridian Satterwhite Date: Tue, 10 Mar 2026 23:38:20 +0000 Subject: [PATCH] docs: add project README, reference docs, license, CLA, and contributing guide Apache-2.0 license with CLA for dual-licensing. Lefthook enforces Signed-off-by on all commits. AGENTS.md updated with new modules. Signed-off-by: Sienna Meridian Satterwhite Signed-off-by: Sienna Meridian Satterwhite --- AGENTS.md | 41 +++-- CLA.md | 51 ++++++ CONTRIBUTING.md | 84 ++++++++++ LICENSE | 199 ++++++++++++++++++++++++ README.md | 135 ++++++++++++++++ docs/README.md | 406 ++++++++++++++++++++++++++++++++++++++++++++++++ lefthook.yml | 29 ++++ 7 files changed, 935 insertions(+), 10 deletions(-) create mode 100644 CLA.md create mode 100644 CONTRIBUTING.md create mode 100644 LICENSE create mode 100644 README.md create mode 100644 docs/README.md create mode 100644 lefthook.yml diff --git a/AGENTS.md b/AGENTS.md index 90d37d4..9e8fc26 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -25,25 +25,46 @@ sunbeam-proxy is a TLS-terminating reverse proxy built on [Pingora](https://gith - **Host-prefix routing**: routes `foo.example.com` by matching prefix `foo` against the config - **Path sub-routes**: longest-prefix match within a host, with optional prefix stripping +- **Static file serving**: try_files chain with SPA fallback, replacing nginx/caddy for frontends +- **URL rewrites**: regex-based path rewrites compiled at startup +- **Response body rewriting**: find/replace in HTML/JS responses (like nginx `sub_filter`) +- **Auth subrequests**: gate path routes with HTTP auth checks (like nginx `auth_request`) +- **HTTP response cache**: per-route in-memory cache via pingora-cache with Cache-Control support +- **Prometheus metrics**: request totals, latency histograms, detection decisions, cache hit/miss +- **Request IDs**: UUID v4 per request, forwarded to upstreams and clients via `X-Request-Id` +- **DDoS detection**: KNN-based per-IP behavioral classification +- **Scanner detection**: logistic regression per-request classification with bot allowlist +- **Rate limiting**: leaky bucket per-identity throttling - **ACME HTTP-01 challenges**: routes `/.well-known/acme-challenge/*` to cert-manager solver pods - **TLS cert hot-reload**: watches K8s Secrets, writes cert files, triggers zero-downtime upgrade - **Config hot-reload**: watches K8s ConfigMaps, triggers graceful upgrade on change - **SSH TCP passthrough**: raw TCP proxy for SSH traffic (port 22 to Gitea) - **HTTP-to-HTTPS redirect**: with per-route opt-out via `disable_secure_redirection` +See [docs/README.md](docs/README.md) for full feature documentation and configuration reference. + ## Source Files ``` -src/main.rs — binary entry point: server bootstrap, watcher spawn, SSH spawn -src/lib.rs — library crate root: re-exports acme, config, proxy, ssh -src/config.rs — TOML config deserialization (Config, RouteConfig, PathRoute) -src/proxy.rs — ProxyHttp impl: request_filter, upstream_peer, upstream_request_filter, logging -src/acme.rs — Ingress watcher: maintains AcmeRoutes (path → solver backend) -src/watcher.rs — Secret/ConfigMap watcher: cert write + graceful upgrade trigger -src/cert.rs — fetch_and_write / write_from_secret: K8s Secret → cert files on disk -src/telemetry.rs — JSON logging + optional OTEL tracing init -src/ssh.rs — TCP proxy: tokio TcpListener + copy_bidirectional -tests/e2e.rs — end-to-end test: real SunbeamProxy over plain HTTP with echo backend +src/main.rs — binary entry point: server bootstrap, watcher spawn, SSH spawn +src/lib.rs — library crate root: re-exports all modules +src/config.rs — TOML config deserialization (Config, RouteConfig, PathRoute, CacheConfig, etc.) +src/proxy.rs — ProxyHttp impl: request_filter, cache hooks, upstream_peer, body rewriting, logging +src/acme.rs — Ingress watcher: maintains AcmeRoutes (path → solver backend) +src/watcher.rs — Secret/ConfigMap watcher: cert write + graceful upgrade trigger +src/cert.rs — fetch_and_write / write_from_secret: K8s Secret → cert files on disk +src/telemetry.rs — JSON logging + optional OTEL tracing init +src/ssh.rs — TCP proxy: tokio TcpListener + copy_bidirectional +src/metrics.rs — Prometheus counters/histograms/gauge, metrics HTTP server, /health endpoint +src/static_files.rs — Static file serving with try_files chain and SPA fallback +src/cache.rs — pingora-cache MemCache backend and Cache-Control TTL parser +src/ddos/ — KNN-based DDoS detection (model, detector, training, replay) +src/scanner/ — Logistic regression scanner detection (model, detector, features, training, allowlist, watcher) +src/rate_limit/ — Leaky bucket rate limiter (limiter, key extraction) +src/dual_stack.rs — Dual-stack (IPv4+IPv6) TCP listener +tests/e2e.rs — end-to-end test: real SunbeamProxy over plain HTTP with echo backend +tests/proptest.rs — property-based tests for static files, rewrites, config, metrics, etc. +docs/README.md — comprehensive feature documentation ``` ## Architecture Invariants — Do Not Break These diff --git a/CLA.md b/CLA.md new file mode 100644 index 0000000..137a613 --- /dev/null +++ b/CLA.md @@ -0,0 +1,51 @@ +# Sunbeam Studios Contributor License Agreement + +Thank you for your interest in contributing to Sunbeam Proxy. This Contributor License Agreement ("CLA") ensures that contributions to this project can be properly licensed and maintained. + +## Why a CLA? + +Sunbeam Proxy is licensed under Apache-2.0. We use a CLA so that Sunbeam Studios retains the ability to offer commercial licenses for organizations that need them. This is the same model used by projects like Elasticsearch (pre-BSL), Qt, and MySQL. + +Your contributions remain yours. You're granting us a license, not transferring ownership. + +## Agreement + +By submitting a contribution (via pull request, patch, or any other mechanism) to this project, and by including a `Signed-off-by` line in your commits, you agree to the following: + +### 1. Definitions + +- **"You"** means the individual or legal entity submitting the contribution. +- **"Contribution"** means any original work of authorship, including modifications or additions to existing work, that you submit to this project. +- **"Project"** means Sunbeam Proxy and related repositories maintained by Sunbeam Studios. + +### 2. Grant of Copyright License + +You grant Sunbeam Studios a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute your Contributions and any derivative works thereof, under any license terms, including without limitation any open source license or any proprietary or commercial license. + +### 3. Grant of Patent License + +You grant Sunbeam Studios a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer your Contributions, where such license applies only to patent claims licensable by you that are necessarily infringed by your Contribution(s) alone or by combination with the Project. + +### 4. Representations + +You represent that: + +- Each Contribution is your original creation, or you have sufficient rights to grant the licenses above. +- Your Contribution does not violate any third party's intellectual property rights. +- If your employer has rights to intellectual property you create, you have received permission to submit Contributions on behalf of your employer, or your employer has waived such rights. + +### 5. No Obligation + +You understand that this Project and your Contributions are provided on an "AS IS" basis, without warranties or conditions of any kind. Sunbeam Studios is under no obligation to accept, use, or include any Contribution. + +## How to Sign + +Include a `Signed-off-by` line in every commit you submit: + +``` +Signed-off-by: Your Name +``` + +You can do this automatically with `git commit -s`. + +The `Signed-off-by` line indicates that you have read and agree to this CLA. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..ff75808 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,84 @@ +# Contributing to Sunbeam Proxy + +We're a small team and we welcome contributions. Here's what you need to know. + +## Contributor License Agreement + +All contributions require a signed CLA. Read [CLA.md](CLA.md) for the full text. + +**The short version:** you keep ownership of your code, but you grant Sunbeam Studios the right to license it under any terms (including commercial). This lets us offer dual licensing while keeping the project Apache-2.0 for everyone. + +### How to sign + +Add `Signed-off-by` to every commit: + +```bash +git commit -s -m "feat(proxy): add cool thing" +``` + +This is enforced by a lefthook commit-msg hook. Set up lefthook: + +```bash +lefthook install +``` + +If you forget, amend your commit: + +```bash +git commit --amend -s +``` + +## Development Setup + +```bash +# Install Rust (if you haven't) +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh + +# Clone and build +git clone https://src.sunbeam.pt/studio/proxy.git +cd proxy +cargo build + +# Run tests +cargo nextest run +# or +cargo test + +# Local dev server +SUNBEAM_CONFIG=dev.toml RUST_LOG=info cargo run +``` + +## Making Changes + +1. Fork the repo and create a branch +2. Make your changes +3. Run `cargo check`, `cargo test`, and `cargo clippy -- -D warnings` +4. Commit with conventional commit messages and `Signed-off-by` +5. Open a pull request + +### Commit Messages + +We use [conventional commits](https://www.conventionalcommits.org/): + +``` +feat(proxy): add WebSocket compression support +fix(cache): respect Vary header in cache key +docs: update configuration reference +test: add proptests for rate limiter +``` + +### Code Style + +- `cargo fmt` for formatting +- `cargo clippy -- -D warnings` for lints +- No `unwrap()` in production paths +- `#[serde(default)]` on new config fields for backwards compatibility +- Read the file before you edit it + +## Reporting Issues + +Open an issue at [src.sunbeam.pt/studio/proxy/issues](https://src.sunbeam.pt/studio/proxy/issues). + +## License + +By contributing, you agree that your contributions will be licensed under the Apache License 2.0, and that you grant additional rights as described in [CLA.md](CLA.md). diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..e2218ea --- /dev/null +++ b/LICENSE @@ -0,0 +1,199 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to the Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by the Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding any notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. Please also get (or round off) + your own UID and put it after the copyright sign. + + Copyright 2025-2026 Sunbeam Studios + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/README.md b/README.md new file mode 100644 index 0000000..1315a87 --- /dev/null +++ b/README.md @@ -0,0 +1,135 @@ +# Sunbeam Proxy + +A cloud-native reverse proxy with adaptive ML threat detection. Built on [Pingora](https://github.com/cloudflare/pingora) by [Sunbeam Studios](https://sunbeam.pt). + +Sunbeam Proxy learns what normal traffic looks like *for your infrastructure* and automatically adapts its defenses. Instead of relying on generic rulesets written for someone else's problems, it trains on your own audit logs to build behavioral models that protect against the threats you actually face. + +## Why It Exists + +We are a small, women-led queer game studio and we need to be able to handle extraordinary threats in today's internet. We have a small team and a small budget, so we need to be able to do more with less. We also need to be able to scale up quickly when we need to without having to worry about the security of our infrastructure. However, the problems faced in different regions, and with different bot nets, DDoS attacks, and other threats, make it difficult to find a scalable solution. + +## What it does + +**Adaptive threat detection** — Two ML models run in the request pipeline. A KNN-based DDoS detector classifies per-IP behavior over sliding windows. A logistic regression scanner detector catches vulnerability probes, directory enumeration, and bot traffic per-request. Both models are trained on your logs, hot-reloaded without downtime, and continuously improvable as your traffic evolves. + +**Rate limiting** — Leaky bucket throttling with identity-aware keys (session cookies, bearer tokens, or IP fallback). Separate limits for authenticated and unauthenticated traffic. 256-shard concurrent map, zero contention. + +**HTTP response caching** — Per-route in-memory cache backed by pingora-cache. Respects `Cache-Control`, supports `stale-while-revalidate`, and sits after the security pipeline so blocked requests never touch the cache. + +**Static file serving** — Serve frontends directly from the proxy with try_files chains, SPA fallback, content-type detection, and cache headers. Replace nginx/caddy sidecar containers with a single config block. + +**Everything else you need from a reverse proxy** — TLS termination with cert hot-reload, host-prefix routing, path sub-routes with prefix stripping, regex URL rewrites, response body rewriting (like nginx `sub_filter`), auth subrequests, WebSocket forwarding, SSH TCP passthrough, HTTP-to-HTTPS redirect, ACME HTTP-01 challenge routing, and Prometheus metrics with request tracing. + +## Quick start + +```sh +cargo build +SUNBEAM_CONFIG=dev.toml RUST_LOG=info cargo run +``` + +See [docs/](docs/README.md) for full configuration reference. + +## The self-learning loop + +``` + your traffic + │ + ▼ + ┌─────────────────────────┐ + │ Sunbeam Proxy │ + │ │ + │ DDoS ──► Scanner ──► │──── audit logs (JSON) + │ Rate Limit ──► Cache │ │ + └─────────────────────────┘ │ + ▼ + ┌───────────────┐ + │ Train models │ + │ on your logs │ + └───────┬───────┘ + │ + hot-reload + │ + ▼ + updated models + (no restart needed) +``` + +Every request produces a structured audit log with 15+ behavioral features. Feed those logs back into the training pipeline and the models get better at distinguishing your real users from threats — automatically, without manual rule-writing. + +```sh +# Train DDoS model from your audit logs +cargo run -- train --input logs.jsonl --output ddos_model.bin --heuristics heuristics.toml + +# Train scanner model +cargo run -- train-scanner --input logs.jsonl --output scanner_model.bin + +# Replay logs to evaluate model accuracy +cargo run -- replay --input logs.jsonl --model ddos_model.bin +``` + +## Detection pipeline + +Every HTTPS request passes through three detection layers before reaching your backend: + +| Layer | Model | Granularity | Response | +|-------|-------|-------------|----------| +| DDoS | KNN (14-feature behavioral vectors) | Per-IP over sliding window | 429 + Retry-After | +| Scanner | Logistic regression (path, UA, headers) | Per-request | 403 | +| Rate limit | Leaky bucket | Per-identity (session/token/IP) | 429 + Retry-After | + +Verified bots (Googlebot, Bingbot, etc.) bypass scanner detection via reverse-DNS verification and configurable allowlists. + +## Configuration + +Everything is TOML. Here's a route that serves a frontend statically with an API backend, response body rewriting, caching, and custom headers: + +```toml +[[routes]] +host_prefix = "docs" +backend = "http://docs-backend:8080" +static_root = "/srv/docs" +fallback = "index.html" + +[routes.cache] +enabled = true +default_ttl_secs = 300 + +[[routes.rewrites]] +pattern = "^/docs/[0-9a-f-]+/?$" +target = "/docs/[id]/index.html" + +[[routes.body_rewrites]] +find = "old-domain.example.com" +replace = "docs.sunbeam.pt" + +[[routes.response_headers]] +name = "X-Frame-Options" +value = "DENY" + +[[routes.paths]] +prefix = "/api" +backend = "http://docs-api:8000" +strip_prefix = true +``` + +## Observability + +- **Request IDs**: UUID v4 per request, forwarded via `X-Request-Id` to upstreams and clients +- **Prometheus metrics**: `GET /metrics` on configurable port — request totals, latency histograms, detection decisions, cache hit rates, active connections +- **Health checks**: `GET /health` returns 200 for k8s probes +- **Structured audit logs**: JSON with request ID, client IP, timing, headers, backend, detection decisions + +## Building + +```sh +cargo build # debug +cargo build --release --target x86_64-unknown-linux-musl # release (container) +cargo test # all tests +cargo clippy -- -D warnings # lint +``` + +## License + +Apache License 2.0. See [LICENSE](LICENSE). + +Contributions require a signed CLA — see [CONTRIBUTING.md](CONTRIBUTING.md) and [CLA.md](CLA.md) for details. diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..ea13b0e --- /dev/null +++ b/docs/README.md @@ -0,0 +1,406 @@ +--- +layout: default +title: Sunbeam Proxy Documentation +description: Configuration reference and feature documentation for Sunbeam Proxy +toc: true +--- + +# Sunbeam Proxy Documentation + +Complete reference for configuring and operating Sunbeam Proxy — a TLS-terminating reverse proxy built on [Pingora](https://github.com/cloudflare/pingora) 0.8. + +## Quick Start + +```sh +# Local development +SUNBEAM_CONFIG=dev.toml RUST_LOG=info cargo run + +# Run tests +cargo nextest run + +# Build release (linux-musl for containers) +cargo build --release --target x86_64-unknown-linux-musl +``` + +--- + +## Configuration Reference + +Configuration is TOML, loaded from `$SUNBEAM_CONFIG` or `/etc/pingora/config.toml`. + +### Listeners & TLS + +```toml +[listen] +http = "0.0.0.0:80" +https = "0.0.0.0:443" + +[tls] +cert_path = "/etc/ssl/tls.crt" +key_path = "/etc/ssl/tls.key" +``` + +### Telemetry + +```toml +[telemetry] +otlp_endpoint = "" # OpenTelemetry OTLP endpoint (empty = disabled) +metrics_port = 9090 # Prometheus scrape port (0 = disabled) +``` + +### Routes + +Each route maps a host prefix to a backend. `host_prefix = "docs"` matches requests to `docs.`. + +```toml +[[routes]] +host_prefix = "docs" +backend = "http://docs-backend.default.svc.cluster.local:8080" +websocket = false # forward WebSocket upgrade headers +disable_secure_redirection = false # true = allow plain HTTP +``` + +#### Path Sub-Routes + +Longest-prefix match within a host. Mix static serving with API proxying. + +```toml +[[routes.paths]] +prefix = "/api" +backend = "http://api-backend:8000" +strip_prefix = true # /api/users → /users +websocket = false +``` + +#### Static File Serving + +Serve frontends directly from the proxy. The try_files chain checks candidates in order: + +1. `$static_root/$uri` — exact file +2. `$static_root/$uri.html` — with `.html` extension +3. `$static_root/$uri/index.html` — directory index +4. `$static_root/$fallback` — SPA fallback + +If nothing matches, the request falls through to the upstream backend. + +```toml +[[routes]] +host_prefix = "meet" +backend = "http://meet-backend:8080" +static_root = "/srv/meet" +fallback = "index.html" +``` + +**Content-type detection** is based on file extension: + +| Extensions | Content-Type | +|-----------|-------------| +| `html`, `htm` | `text/html; charset=utf-8` | +| `css` | `text/css; charset=utf-8` | +| `js`, `mjs` | `application/javascript; charset=utf-8` | +| `json` | `application/json; charset=utf-8` | +| `svg` | `image/svg+xml` | +| `png`, `jpg`, `gif`, `webp`, `avif` | `image/*` | +| `woff`, `woff2`, `ttf`, `otf` | `font/*` | +| `wasm` | `application/wasm` | + +**Cache-control headers** are set per extension type: + +| Extensions | Cache-Control | +|-----------|-------------| +| `js`, `css`, `woff2`, `wasm` | `public, max-age=31536000, immutable` | +| `png`, `jpg`, `svg`, `ico` | `public, max-age=86400` | +| Everything else | `no-cache` | + +Path sub-routes take priority over static serving — if `/api` matches a path route, it goes to that backend even if a static file exists. + +Path traversal (`..`) is rejected and falls through to the upstream. + +#### URL Rewrites + +Regex patterns compiled at startup, applied before static file lookup. First match wins. + +```toml +[[routes.rewrites]] +pattern = "^/docs/[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}/?$" +target = "/docs/[id]/index.html" +``` + +#### Response Body Rewriting + +Find/replace in response bodies, like nginx `sub_filter`. Only applies to `text/html`, `application/javascript`, and `text/javascript` responses. Binary responses pass through untouched. + +The entire response is buffered in memory before substitution (fine for HTML/JS — typically <1MB). `Content-Length` is removed since the body size may change. + +```toml +[[routes.body_rewrites]] +find = "old-domain.example.com" +replace = "new-domain.sunbeam.pt" +``` + +#### Custom Response Headers + +```toml +[[routes.response_headers]] +name = "X-Frame-Options" +value = "DENY" +``` + +#### Auth Subrequests + +Gate path routes with an HTTP auth check before forwarding upstream. Similar to nginx `auth_request`. + +```toml +[[routes.paths]] +prefix = "/media" +backend = "http://seaweedfs-filer:8333" +strip_prefix = true +auth_request = "http://drive-backend/api/v1.0/items/media-auth/" +auth_capture_headers = ["Authorization", "X-Amz-Date", "X-Amz-Content-Sha256"] +upstream_path_prefix = "/sunbeam-drive/" +``` + +The auth subrequest sends a GET to `auth_request` with the original `Cookie`, `Authorization`, and `X-Original-URI` headers. + +| Auth response | Proxy behavior | +|--------------|----------------| +| 2xx | Capture specified headers, forward to backend | +| Non-2xx | Return 403 to client | +| Network error | Return 502 to client | + +#### HTTP Response Cache + +Per-route in-memory cache backed by pingora-cache. + +```toml +[routes.cache] +enabled = true +default_ttl_secs = 60 # TTL when upstream has no Cache-Control +stale_while_revalidate_secs = 0 # serve stale while revalidating +max_file_size = 0 # max cacheable body size (0 = unlimited) +``` + +**Pipeline position**: Cache runs after the security pipeline and before upstream modifications. + +``` +Request → DDoS → Scanner → Rate Limit → Cache → Upstream +``` + +Cache behavior: +- Only caches GET and HEAD requests +- Respects `Cache-Control: no-store` and `Cache-Control: private` +- TTL priority: `s-maxage` > `max-age` > `default_ttl_secs` +- Skips routes with body rewrites (content varies) +- Skips requests with auth subrequest headers (per-user content) +- Cache key: `{host}{path}?{query}` + +### SSH Passthrough + +Raw TCP proxy for SSH traffic. + +```toml +[ssh] +listen = "0.0.0.0:22" +backend = "gitea-ssh.devtools.svc.cluster.local:2222" +``` + +### DDoS Detection + +KNN-based per-IP behavioral classification over sliding windows. + +```toml +[ddos] +enabled = true +model_path = "ddos_model.bin" +k = 5 +threshold = 0.6 +window_secs = 60 +window_capacity = 1000 +min_events = 10 +``` + +### Scanner Detection + +Logistic regression per-request classification with verified bot allowlist. + +```toml +[scanner] +enabled = true +model_path = "scanner_model.bin" +threshold = 0.5 +poll_interval_secs = 30 # hot-reload check interval (0 = disabled) +bot_cache_ttl_secs = 86400 # verified bot IP cache TTL + +[[scanner.allowlist]] +ua_prefix = "Googlebot" +reason = "Google crawler" +dns_suffixes = ["googlebot.com", "google.com"] +cidrs = ["66.249.64.0/19"] +``` + +### Rate Limiting + +Leaky bucket per-identity throttling. Identity resolution: `ory_kratos_session` cookie > Bearer token > client IP. + +```toml +[rate_limit] +enabled = true +eviction_interval_secs = 300 +stale_after_secs = 600 +bypass_cidrs = ["10.42.0.0/16"] + +[rate_limit.authenticated] +burst = 200 +rate = 50.0 + +[rate_limit.unauthenticated] +burst = 50 +rate = 10.0 +``` + +--- + +## Observability + +### Request IDs + +Every request gets a UUID v4 request ID. It's: +- Attached to a `tracing::info_span!` so all log lines within the request inherit it +- Forwarded upstream via `X-Request-Id` +- Returned to clients via `X-Request-Id` +- Included in audit log lines + +### Prometheus Metrics + +Served at `GET /metrics` on `metrics_port` (default 9090). + +| Metric | Type | Labels | +|--------|------|--------| +| `sunbeam_requests_total` | Counter | `method`, `host`, `status`, `backend` | +| `sunbeam_request_duration_seconds` | Histogram | — | +| `sunbeam_ddos_decisions_total` | Counter | `decision` | +| `sunbeam_scanner_decisions_total` | Counter | `decision`, `reason` | +| `sunbeam_rate_limit_decisions_total` | Counter | `decision` | +| `sunbeam_cache_status_total` | Counter | `status` | +| `sunbeam_active_connections` | Gauge | — | + +`GET /health` returns 200 for k8s probes. + +```yaml +# Prometheus scrape config +- job_name: sunbeam-proxy + static_configs: + - targets: ['sunbeam-proxy.ingress.svc.cluster.local:9090'] +``` + +### Audit Logs + +Every request produces a structured JSON log line (`target = "audit"`): + +```json +{ + "request_id": "550e8400-e29b-41d4-a716-446655440000", + "method": "GET", + "host": "docs.sunbeam.pt", + "path": "/api/v1/pages", + "query": "limit=10", + "client_ip": "203.0.113.42", + "status": 200, + "duration_ms": 23, + "content_length": 0, + "user_agent": "Mozilla/5.0 ...", + "referer": "https://docs.sunbeam.pt/", + "accept_language": "en-US", + "accept": "text/html", + "has_cookies": true, + "cf_country": "FR", + "backend": "http://docs-backend:8080", + "error": null +} +``` + +### Detection Pipeline Logs + +Each security layer emits a `target = "pipeline"` log line before acting: + +``` +layer=ddos → all HTTPS traffic (scanner training data) +layer=scanner → traffic that passed DDoS (rate-limit training data) +layer=rate_limit → traffic that passed scanner +``` + +This guarantees training pipelines always see the full traffic picture. + +--- + +## CLI Commands + +```sh +# Start the proxy server +sunbeam-proxy serve [--upgrade] + +# Train DDoS model from audit logs +sunbeam-proxy train --input logs.jsonl --output ddos_model.bin \ + [--attack-ips ips.txt] [--normal-ips ips.txt] \ + [--heuristics heuristics.toml] [--k 5] [--threshold 0.6] + +# Replay logs through detection pipeline +sunbeam-proxy replay --input logs.jsonl --model ddos_model.bin \ + [--config config.toml] [--rate-limit] + +# Train scanner model +sunbeam-proxy train-scanner --input logs.jsonl --output scanner_model.bin \ + [--wordlists path/to/wordlists] [--threshold 0.5] +``` + +--- + +## Architecture + +### Source Files + +``` +src/main.rs — server bootstrap, watcher spawn, SSH spawn +src/lib.rs — library crate root +src/config.rs — TOML config deserialization +src/proxy.rs — ProxyHttp impl: routing, filtering, caching, logging +src/acme.rs — Ingress watcher for ACME HTTP-01 challenges +src/watcher.rs — Secret/ConfigMap watcher for cert + config hot-reload +src/cert.rs — K8s Secret → cert files on disk +src/telemetry.rs — JSON logging + OTEL tracing init +src/ssh.rs — TCP proxy for SSH passthrough +src/metrics.rs — Prometheus metrics and scrape endpoint +src/static_files.rs — Static file serving with try_files chain +src/cache.rs — pingora-cache MemCache backend +src/ddos/ — KNN-based DDoS detection +src/scanner/ — Logistic regression scanner detection +src/rate_limit/ — Leaky bucket rate limiter +src/dual_stack.rs — Dual-stack (IPv4+IPv6) TCP listener +``` + +### Runtime Model + +Pingora manages its own async runtime. K8s watchers (cert/config, Ingress) each run on separate OS threads with their own tokio runtimes. This isolation is deliberate — Pingora's internal runtime has specific constraints that don't mix with general-purpose async work. + +### Security Pipeline + +``` +Request + │ + ├── DDoS detection (KNN per-IP) + │ └── blocked → 429 + │ + ├── Scanner detection (logistic regression per-request) + │ └── blocked → 403 + │ + ├── Rate limiting (leaky bucket per-identity) + │ └── blocked → 429 + │ + ├── Cache lookup + │ └── hit → serve cached response + │ + └── Upstream request + ├── Auth subrequest (if configured) + ├── Response body rewriting (if configured) + └── Response to client +``` diff --git a/lefthook.yml b/lefthook.yml new file mode 100644 index 0000000..7aadff3 --- /dev/null +++ b/lefthook.yml @@ -0,0 +1,29 @@ +# Lefthook configuration for Sunbeam Proxy +# Install: lefthook install + +commit-msg: + commands: + dco-signoff: + run: | + if ! grep -q "^Signed-off-by: " "{1}"; then + echo "" + echo "ERROR: Missing Signed-off-by line." + echo "" + echo "All commits to Sunbeam Proxy must include a Signed-off-by line" + echo "indicating agreement with the Contributor License Agreement (CLA.md)." + echo "" + echo "Use: git commit -s" + echo " or: git commit --signoff" + echo "" + exit 1 + fi + +pre-commit: + parallel: true + commands: + cargo-check: + glob: "*.rs" + run: cargo check --quiet + cargo-fmt: + glob: "*.rs" + run: cargo fmt -- --check