docs: add project README, reference docs, license, CLA, and contributing guide

Apache-2.0 license with CLA for dual-licensing. Lefthook enforces
Signed-off-by on all commits. AGENTS.md updated with new modules.

Signed-off-by: Sienna Meridian Satterwhite <sienna@r3t.io>
Signed-off-by: Sienna Meridian Satterwhite <sienna@sunbeam.pt>
This commit is contained in:
2026-03-10 23:38:20 +00:00
parent 39fe5f9f5f
commit 0baab92141
7 changed files with 935 additions and 10 deletions

View File

@@ -25,25 +25,46 @@ sunbeam-proxy is a TLS-terminating reverse proxy built on [Pingora](https://gith
- **Host-prefix routing**: routes `foo.example.com` by matching prefix `foo` against the config - **Host-prefix routing**: routes `foo.example.com` by matching prefix `foo` against the config
- **Path sub-routes**: longest-prefix match within a host, with optional prefix stripping - **Path sub-routes**: longest-prefix match within a host, with optional prefix stripping
- **Static file serving**: try_files chain with SPA fallback, replacing nginx/caddy for frontends
- **URL rewrites**: regex-based path rewrites compiled at startup
- **Response body rewriting**: find/replace in HTML/JS responses (like nginx `sub_filter`)
- **Auth subrequests**: gate path routes with HTTP auth checks (like nginx `auth_request`)
- **HTTP response cache**: per-route in-memory cache via pingora-cache with Cache-Control support
- **Prometheus metrics**: request totals, latency histograms, detection decisions, cache hit/miss
- **Request IDs**: UUID v4 per request, forwarded to upstreams and clients via `X-Request-Id`
- **DDoS detection**: KNN-based per-IP behavioral classification
- **Scanner detection**: logistic regression per-request classification with bot allowlist
- **Rate limiting**: leaky bucket per-identity throttling
- **ACME HTTP-01 challenges**: routes `/.well-known/acme-challenge/*` to cert-manager solver pods - **ACME HTTP-01 challenges**: routes `/.well-known/acme-challenge/*` to cert-manager solver pods
- **TLS cert hot-reload**: watches K8s Secrets, writes cert files, triggers zero-downtime upgrade - **TLS cert hot-reload**: watches K8s Secrets, writes cert files, triggers zero-downtime upgrade
- **Config hot-reload**: watches K8s ConfigMaps, triggers graceful upgrade on change - **Config hot-reload**: watches K8s ConfigMaps, triggers graceful upgrade on change
- **SSH TCP passthrough**: raw TCP proxy for SSH traffic (port 22 to Gitea) - **SSH TCP passthrough**: raw TCP proxy for SSH traffic (port 22 to Gitea)
- **HTTP-to-HTTPS redirect**: with per-route opt-out via `disable_secure_redirection` - **HTTP-to-HTTPS redirect**: with per-route opt-out via `disable_secure_redirection`
See [docs/README.md](docs/README.md) for full feature documentation and configuration reference.
## Source Files ## Source Files
``` ```
src/main.rs — binary entry point: server bootstrap, watcher spawn, SSH spawn src/main.rs — binary entry point: server bootstrap, watcher spawn, SSH spawn
src/lib.rs — library crate root: re-exports acme, config, proxy, ssh src/lib.rs — library crate root: re-exports all modules
src/config.rs — TOML config deserialization (Config, RouteConfig, PathRoute) src/config.rs — TOML config deserialization (Config, RouteConfig, PathRoute, CacheConfig, etc.)
src/proxy.rs — ProxyHttp impl: request_filter, upstream_peer, upstream_request_filter, logging src/proxy.rs — ProxyHttp impl: request_filter, cache hooks, upstream_peer, body rewriting, logging
src/acme.rs — Ingress watcher: maintains AcmeRoutes (path → solver backend) src/acme.rs — Ingress watcher: maintains AcmeRoutes (path → solver backend)
src/watcher.rs — Secret/ConfigMap watcher: cert write + graceful upgrade trigger src/watcher.rs — Secret/ConfigMap watcher: cert write + graceful upgrade trigger
src/cert.rs — fetch_and_write / write_from_secret: K8s Secret → cert files on disk src/cert.rs — fetch_and_write / write_from_secret: K8s Secret → cert files on disk
src/telemetry.rs — JSON logging + optional OTEL tracing init src/telemetry.rs — JSON logging + optional OTEL tracing init
src/ssh.rs — TCP proxy: tokio TcpListener + copy_bidirectional src/ssh.rs — TCP proxy: tokio TcpListener + copy_bidirectional
tests/e2e.rs — end-to-end test: real SunbeamProxy over plain HTTP with echo backend src/metrics.rs — Prometheus counters/histograms/gauge, metrics HTTP server, /health endpoint
src/static_files.rs — Static file serving with try_files chain and SPA fallback
src/cache.rs — pingora-cache MemCache backend and Cache-Control TTL parser
src/ddos/ — KNN-based DDoS detection (model, detector, training, replay)
src/scanner/ — Logistic regression scanner detection (model, detector, features, training, allowlist, watcher)
src/rate_limit/ — Leaky bucket rate limiter (limiter, key extraction)
src/dual_stack.rs — Dual-stack (IPv4+IPv6) TCP listener
tests/e2e.rs — end-to-end test: real SunbeamProxy over plain HTTP with echo backend
tests/proptest.rs — property-based tests for static files, rewrites, config, metrics, etc.
docs/README.md — comprehensive feature documentation
``` ```
## Architecture Invariants — Do Not Break These ## Architecture Invariants — Do Not Break These

51
CLA.md Normal file
View File

@@ -0,0 +1,51 @@
# Sunbeam Studios Contributor License Agreement
Thank you for your interest in contributing to Sunbeam Proxy. This Contributor License Agreement ("CLA") ensures that contributions to this project can be properly licensed and maintained.
## Why a CLA?
Sunbeam Proxy is licensed under Apache-2.0. We use a CLA so that Sunbeam Studios retains the ability to offer commercial licenses for organizations that need them. This is the same model used by projects like Elasticsearch (pre-BSL), Qt, and MySQL.
Your contributions remain yours. You're granting us a license, not transferring ownership.
## Agreement
By submitting a contribution (via pull request, patch, or any other mechanism) to this project, and by including a `Signed-off-by` line in your commits, you agree to the following:
### 1. Definitions
- **"You"** means the individual or legal entity submitting the contribution.
- **"Contribution"** means any original work of authorship, including modifications or additions to existing work, that you submit to this project.
- **"Project"** means Sunbeam Proxy and related repositories maintained by Sunbeam Studios.
### 2. Grant of Copyright License
You grant Sunbeam Studios a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute your Contributions and any derivative works thereof, under any license terms, including without limitation any open source license or any proprietary or commercial license.
### 3. Grant of Patent License
You grant Sunbeam Studios a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer your Contributions, where such license applies only to patent claims licensable by you that are necessarily infringed by your Contribution(s) alone or by combination with the Project.
### 4. Representations
You represent that:
- Each Contribution is your original creation, or you have sufficient rights to grant the licenses above.
- Your Contribution does not violate any third party's intellectual property rights.
- If your employer has rights to intellectual property you create, you have received permission to submit Contributions on behalf of your employer, or your employer has waived such rights.
### 5. No Obligation
You understand that this Project and your Contributions are provided on an "AS IS" basis, without warranties or conditions of any kind. Sunbeam Studios is under no obligation to accept, use, or include any Contribution.
## How to Sign
Include a `Signed-off-by` line in every commit you submit:
```
Signed-off-by: Your Name <your.email@example.com>
```
You can do this automatically with `git commit -s`.
The `Signed-off-by` line indicates that you have read and agree to this CLA.

84
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,84 @@
# Contributing to Sunbeam Proxy
We're a small team and we welcome contributions. Here's what you need to know.
## Contributor License Agreement
All contributions require a signed CLA. Read [CLA.md](CLA.md) for the full text.
**The short version:** you keep ownership of your code, but you grant Sunbeam Studios the right to license it under any terms (including commercial). This lets us offer dual licensing while keeping the project Apache-2.0 for everyone.
### How to sign
Add `Signed-off-by` to every commit:
```bash
git commit -s -m "feat(proxy): add cool thing"
```
This is enforced by a lefthook commit-msg hook. Set up lefthook:
```bash
lefthook install
```
If you forget, amend your commit:
```bash
git commit --amend -s
```
## Development Setup
```bash
# Install Rust (if you haven't)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Clone and build
git clone https://src.sunbeam.pt/studio/proxy.git
cd proxy
cargo build
# Run tests
cargo nextest run
# or
cargo test
# Local dev server
SUNBEAM_CONFIG=dev.toml RUST_LOG=info cargo run
```
## Making Changes
1. Fork the repo and create a branch
2. Make your changes
3. Run `cargo check`, `cargo test`, and `cargo clippy -- -D warnings`
4. Commit with conventional commit messages and `Signed-off-by`
5. Open a pull request
### Commit Messages
We use [conventional commits](https://www.conventionalcommits.org/):
```
feat(proxy): add WebSocket compression support
fix(cache): respect Vary header in cache key
docs: update configuration reference
test: add proptests for rate limiter
```
### Code Style
- `cargo fmt` for formatting
- `cargo clippy -- -D warnings` for lints
- No `unwrap()` in production paths
- `#[serde(default)]` on new config fields for backwards compatibility
- Read the file before you edit it
## Reporting Issues
Open an issue at [src.sunbeam.pt/studio/proxy/issues](https://src.sunbeam.pt/studio/proxy/issues).
## License
By contributing, you agree that your contributions will be licensed under the Apache License 2.0, and that you grant additional rights as described in [CLA.md](CLA.md).

199
LICENSE Normal file
View File

@@ -0,0 +1,199 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to the Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by the Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding any notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. Please also get (or round off)
your own UID and put it after the copyright sign.
Copyright 2025-2026 Sunbeam Studios
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

135
README.md Normal file
View File

@@ -0,0 +1,135 @@
# Sunbeam Proxy
A cloud-native reverse proxy with adaptive ML threat detection. Built on [Pingora](https://github.com/cloudflare/pingora) by [Sunbeam Studios](https://sunbeam.pt).
Sunbeam Proxy learns what normal traffic looks like *for your infrastructure* and automatically adapts its defenses. Instead of relying on generic rulesets written for someone else's problems, it trains on your own audit logs to build behavioral models that protect against the threats you actually face.
## Why It Exists
We are a small, women-led queer game studio and we need to be able to handle extraordinary threats in today's internet. We have a small team and a small budget, so we need to be able to do more with less. We also need to be able to scale up quickly when we need to without having to worry about the security of our infrastructure. However, the problems faced in different regions, and with different bot nets, DDoS attacks, and other threats, make it difficult to find a scalable solution.
## What it does
**Adaptive threat detection** — Two ML models run in the request pipeline. A KNN-based DDoS detector classifies per-IP behavior over sliding windows. A logistic regression scanner detector catches vulnerability probes, directory enumeration, and bot traffic per-request. Both models are trained on your logs, hot-reloaded without downtime, and continuously improvable as your traffic evolves.
**Rate limiting** — Leaky bucket throttling with identity-aware keys (session cookies, bearer tokens, or IP fallback). Separate limits for authenticated and unauthenticated traffic. 256-shard concurrent map, zero contention.
**HTTP response caching** — Per-route in-memory cache backed by pingora-cache. Respects `Cache-Control`, supports `stale-while-revalidate`, and sits after the security pipeline so blocked requests never touch the cache.
**Static file serving** — Serve frontends directly from the proxy with try_files chains, SPA fallback, content-type detection, and cache headers. Replace nginx/caddy sidecar containers with a single config block.
**Everything else you need from a reverse proxy** — TLS termination with cert hot-reload, host-prefix routing, path sub-routes with prefix stripping, regex URL rewrites, response body rewriting (like nginx `sub_filter`), auth subrequests, WebSocket forwarding, SSH TCP passthrough, HTTP-to-HTTPS redirect, ACME HTTP-01 challenge routing, and Prometheus metrics with request tracing.
## Quick start
```sh
cargo build
SUNBEAM_CONFIG=dev.toml RUST_LOG=info cargo run
```
See [docs/](docs/README.md) for full configuration reference.
## The self-learning loop
```
your traffic
┌─────────────────────────┐
│ Sunbeam Proxy │
│ │
│ DDoS ──► Scanner ──► │──── audit logs (JSON)
│ Rate Limit ──► Cache │ │
└─────────────────────────┘ │
┌───────────────┐
│ Train models │
│ on your logs │
└───────┬───────┘
hot-reload
updated models
(no restart needed)
```
Every request produces a structured audit log with 15+ behavioral features. Feed those logs back into the training pipeline and the models get better at distinguishing your real users from threats — automatically, without manual rule-writing.
```sh
# Train DDoS model from your audit logs
cargo run -- train --input logs.jsonl --output ddos_model.bin --heuristics heuristics.toml
# Train scanner model
cargo run -- train-scanner --input logs.jsonl --output scanner_model.bin
# Replay logs to evaluate model accuracy
cargo run -- replay --input logs.jsonl --model ddos_model.bin
```
## Detection pipeline
Every HTTPS request passes through three detection layers before reaching your backend:
| Layer | Model | Granularity | Response |
|-------|-------|-------------|----------|
| DDoS | KNN (14-feature behavioral vectors) | Per-IP over sliding window | 429 + Retry-After |
| Scanner | Logistic regression (path, UA, headers) | Per-request | 403 |
| Rate limit | Leaky bucket | Per-identity (session/token/IP) | 429 + Retry-After |
Verified bots (Googlebot, Bingbot, etc.) bypass scanner detection via reverse-DNS verification and configurable allowlists.
## Configuration
Everything is TOML. Here's a route that serves a frontend statically with an API backend, response body rewriting, caching, and custom headers:
```toml
[[routes]]
host_prefix = "docs"
backend = "http://docs-backend:8080"
static_root = "/srv/docs"
fallback = "index.html"
[routes.cache]
enabled = true
default_ttl_secs = 300
[[routes.rewrites]]
pattern = "^/docs/[0-9a-f-]+/?$"
target = "/docs/[id]/index.html"
[[routes.body_rewrites]]
find = "old-domain.example.com"
replace = "docs.sunbeam.pt"
[[routes.response_headers]]
name = "X-Frame-Options"
value = "DENY"
[[routes.paths]]
prefix = "/api"
backend = "http://docs-api:8000"
strip_prefix = true
```
## Observability
- **Request IDs**: UUID v4 per request, forwarded via `X-Request-Id` to upstreams and clients
- **Prometheus metrics**: `GET /metrics` on configurable port — request totals, latency histograms, detection decisions, cache hit rates, active connections
- **Health checks**: `GET /health` returns 200 for k8s probes
- **Structured audit logs**: JSON with request ID, client IP, timing, headers, backend, detection decisions
## Building
```sh
cargo build # debug
cargo build --release --target x86_64-unknown-linux-musl # release (container)
cargo test # all tests
cargo clippy -- -D warnings # lint
```
## License
Apache License 2.0. See [LICENSE](LICENSE).
Contributions require a signed CLA — see [CONTRIBUTING.md](CONTRIBUTING.md) and [CLA.md](CLA.md) for details.

406
docs/README.md Normal file
View File

@@ -0,0 +1,406 @@
---
layout: default
title: Sunbeam Proxy Documentation
description: Configuration reference and feature documentation for Sunbeam Proxy
toc: true
---
# Sunbeam Proxy Documentation
Complete reference for configuring and operating Sunbeam Proxy — a TLS-terminating reverse proxy built on [Pingora](https://github.com/cloudflare/pingora) 0.8.
## Quick Start
```sh
# Local development
SUNBEAM_CONFIG=dev.toml RUST_LOG=info cargo run
# Run tests
cargo nextest run
# Build release (linux-musl for containers)
cargo build --release --target x86_64-unknown-linux-musl
```
---
## Configuration Reference
Configuration is TOML, loaded from `$SUNBEAM_CONFIG` or `/etc/pingora/config.toml`.
### Listeners & TLS
```toml
[listen]
http = "0.0.0.0:80"
https = "0.0.0.0:443"
[tls]
cert_path = "/etc/ssl/tls.crt"
key_path = "/etc/ssl/tls.key"
```
### Telemetry
```toml
[telemetry]
otlp_endpoint = "" # OpenTelemetry OTLP endpoint (empty = disabled)
metrics_port = 9090 # Prometheus scrape port (0 = disabled)
```
### Routes
Each route maps a host prefix to a backend. `host_prefix = "docs"` matches requests to `docs.<your-domain>`.
```toml
[[routes]]
host_prefix = "docs"
backend = "http://docs-backend.default.svc.cluster.local:8080"
websocket = false # forward WebSocket upgrade headers
disable_secure_redirection = false # true = allow plain HTTP
```
#### Path Sub-Routes
Longest-prefix match within a host. Mix static serving with API proxying.
```toml
[[routes.paths]]
prefix = "/api"
backend = "http://api-backend:8000"
strip_prefix = true # /api/users → /users
websocket = false
```
#### Static File Serving
Serve frontends directly from the proxy. The try_files chain checks candidates in order:
1. `$static_root/$uri` — exact file
2. `$static_root/$uri.html` — with `.html` extension
3. `$static_root/$uri/index.html` — directory index
4. `$static_root/$fallback` — SPA fallback
If nothing matches, the request falls through to the upstream backend.
```toml
[[routes]]
host_prefix = "meet"
backend = "http://meet-backend:8080"
static_root = "/srv/meet"
fallback = "index.html"
```
**Content-type detection** is based on file extension:
| Extensions | Content-Type |
|-----------|-------------|
| `html`, `htm` | `text/html; charset=utf-8` |
| `css` | `text/css; charset=utf-8` |
| `js`, `mjs` | `application/javascript; charset=utf-8` |
| `json` | `application/json; charset=utf-8` |
| `svg` | `image/svg+xml` |
| `png`, `jpg`, `gif`, `webp`, `avif` | `image/*` |
| `woff`, `woff2`, `ttf`, `otf` | `font/*` |
| `wasm` | `application/wasm` |
**Cache-control headers** are set per extension type:
| Extensions | Cache-Control |
|-----------|-------------|
| `js`, `css`, `woff2`, `wasm` | `public, max-age=31536000, immutable` |
| `png`, `jpg`, `svg`, `ico` | `public, max-age=86400` |
| Everything else | `no-cache` |
Path sub-routes take priority over static serving — if `/api` matches a path route, it goes to that backend even if a static file exists.
Path traversal (`..`) is rejected and falls through to the upstream.
#### URL Rewrites
Regex patterns compiled at startup, applied before static file lookup. First match wins.
```toml
[[routes.rewrites]]
pattern = "^/docs/[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}/?$"
target = "/docs/[id]/index.html"
```
#### Response Body Rewriting
Find/replace in response bodies, like nginx `sub_filter`. Only applies to `text/html`, `application/javascript`, and `text/javascript` responses. Binary responses pass through untouched.
The entire response is buffered in memory before substitution (fine for HTML/JS — typically <1MB). `Content-Length` is removed since the body size may change.
```toml
[[routes.body_rewrites]]
find = "old-domain.example.com"
replace = "new-domain.sunbeam.pt"
```
#### Custom Response Headers
```toml
[[routes.response_headers]]
name = "X-Frame-Options"
value = "DENY"
```
#### Auth Subrequests
Gate path routes with an HTTP auth check before forwarding upstream. Similar to nginx `auth_request`.
```toml
[[routes.paths]]
prefix = "/media"
backend = "http://seaweedfs-filer:8333"
strip_prefix = true
auth_request = "http://drive-backend/api/v1.0/items/media-auth/"
auth_capture_headers = ["Authorization", "X-Amz-Date", "X-Amz-Content-Sha256"]
upstream_path_prefix = "/sunbeam-drive/"
```
The auth subrequest sends a GET to `auth_request` with the original `Cookie`, `Authorization`, and `X-Original-URI` headers.
| Auth response | Proxy behavior |
|--------------|----------------|
| 2xx | Capture specified headers, forward to backend |
| Non-2xx | Return 403 to client |
| Network error | Return 502 to client |
#### HTTP Response Cache
Per-route in-memory cache backed by pingora-cache.
```toml
[routes.cache]
enabled = true
default_ttl_secs = 60 # TTL when upstream has no Cache-Control
stale_while_revalidate_secs = 0 # serve stale while revalidating
max_file_size = 0 # max cacheable body size (0 = unlimited)
```
**Pipeline position**: Cache runs after the security pipeline and before upstream modifications.
```
Request → DDoS → Scanner → Rate Limit → Cache → Upstream
```
Cache behavior:
- Only caches GET and HEAD requests
- Respects `Cache-Control: no-store` and `Cache-Control: private`
- TTL priority: `s-maxage` > `max-age` > `default_ttl_secs`
- Skips routes with body rewrites (content varies)
- Skips requests with auth subrequest headers (per-user content)
- Cache key: `{host}{path}?{query}`
### SSH Passthrough
Raw TCP proxy for SSH traffic.
```toml
[ssh]
listen = "0.0.0.0:22"
backend = "gitea-ssh.devtools.svc.cluster.local:2222"
```
### DDoS Detection
KNN-based per-IP behavioral classification over sliding windows.
```toml
[ddos]
enabled = true
model_path = "ddos_model.bin"
k = 5
threshold = 0.6
window_secs = 60
window_capacity = 1000
min_events = 10
```
### Scanner Detection
Logistic regression per-request classification with verified bot allowlist.
```toml
[scanner]
enabled = true
model_path = "scanner_model.bin"
threshold = 0.5
poll_interval_secs = 30 # hot-reload check interval (0 = disabled)
bot_cache_ttl_secs = 86400 # verified bot IP cache TTL
[[scanner.allowlist]]
ua_prefix = "Googlebot"
reason = "Google crawler"
dns_suffixes = ["googlebot.com", "google.com"]
cidrs = ["66.249.64.0/19"]
```
### Rate Limiting
Leaky bucket per-identity throttling. Identity resolution: `ory_kratos_session` cookie > Bearer token > client IP.
```toml
[rate_limit]
enabled = true
eviction_interval_secs = 300
stale_after_secs = 600
bypass_cidrs = ["10.42.0.0/16"]
[rate_limit.authenticated]
burst = 200
rate = 50.0
[rate_limit.unauthenticated]
burst = 50
rate = 10.0
```
---
## Observability
### Request IDs
Every request gets a UUID v4 request ID. It's:
- Attached to a `tracing::info_span!` so all log lines within the request inherit it
- Forwarded upstream via `X-Request-Id`
- Returned to clients via `X-Request-Id`
- Included in audit log lines
### Prometheus Metrics
Served at `GET /metrics` on `metrics_port` (default 9090).
| Metric | Type | Labels |
|--------|------|--------|
| `sunbeam_requests_total` | Counter | `method`, `host`, `status`, `backend` |
| `sunbeam_request_duration_seconds` | Histogram | — |
| `sunbeam_ddos_decisions_total` | Counter | `decision` |
| `sunbeam_scanner_decisions_total` | Counter | `decision`, `reason` |
| `sunbeam_rate_limit_decisions_total` | Counter | `decision` |
| `sunbeam_cache_status_total` | Counter | `status` |
| `sunbeam_active_connections` | Gauge | — |
`GET /health` returns 200 for k8s probes.
```yaml
# Prometheus scrape config
- job_name: sunbeam-proxy
static_configs:
- targets: ['sunbeam-proxy.ingress.svc.cluster.local:9090']
```
### Audit Logs
Every request produces a structured JSON log line (`target = "audit"`):
```json
{
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"method": "GET",
"host": "docs.sunbeam.pt",
"path": "/api/v1/pages",
"query": "limit=10",
"client_ip": "203.0.113.42",
"status": 200,
"duration_ms": 23,
"content_length": 0,
"user_agent": "Mozilla/5.0 ...",
"referer": "https://docs.sunbeam.pt/",
"accept_language": "en-US",
"accept": "text/html",
"has_cookies": true,
"cf_country": "FR",
"backend": "http://docs-backend:8080",
"error": null
}
```
### Detection Pipeline Logs
Each security layer emits a `target = "pipeline"` log line before acting:
```
layer=ddos → all HTTPS traffic (scanner training data)
layer=scanner → traffic that passed DDoS (rate-limit training data)
layer=rate_limit → traffic that passed scanner
```
This guarantees training pipelines always see the full traffic picture.
---
## CLI Commands
```sh
# Start the proxy server
sunbeam-proxy serve [--upgrade]
# Train DDoS model from audit logs
sunbeam-proxy train --input logs.jsonl --output ddos_model.bin \
[--attack-ips ips.txt] [--normal-ips ips.txt] \
[--heuristics heuristics.toml] [--k 5] [--threshold 0.6]
# Replay logs through detection pipeline
sunbeam-proxy replay --input logs.jsonl --model ddos_model.bin \
[--config config.toml] [--rate-limit]
# Train scanner model
sunbeam-proxy train-scanner --input logs.jsonl --output scanner_model.bin \
[--wordlists path/to/wordlists] [--threshold 0.5]
```
---
## Architecture
### Source Files
```
src/main.rs — server bootstrap, watcher spawn, SSH spawn
src/lib.rs — library crate root
src/config.rs — TOML config deserialization
src/proxy.rs — ProxyHttp impl: routing, filtering, caching, logging
src/acme.rs — Ingress watcher for ACME HTTP-01 challenges
src/watcher.rs — Secret/ConfigMap watcher for cert + config hot-reload
src/cert.rs — K8s Secret → cert files on disk
src/telemetry.rs — JSON logging + OTEL tracing init
src/ssh.rs — TCP proxy for SSH passthrough
src/metrics.rs — Prometheus metrics and scrape endpoint
src/static_files.rs — Static file serving with try_files chain
src/cache.rs — pingora-cache MemCache backend
src/ddos/ — KNN-based DDoS detection
src/scanner/ — Logistic regression scanner detection
src/rate_limit/ — Leaky bucket rate limiter
src/dual_stack.rs — Dual-stack (IPv4+IPv6) TCP listener
```
### Runtime Model
Pingora manages its own async runtime. K8s watchers (cert/config, Ingress) each run on separate OS threads with their own tokio runtimes. This isolation is deliberate — Pingora's internal runtime has specific constraints that don't mix with general-purpose async work.
### Security Pipeline
```
Request
├── DDoS detection (KNN per-IP)
│ └── blocked → 429
├── Scanner detection (logistic regression per-request)
│ └── blocked → 403
├── Rate limiting (leaky bucket per-identity)
│ └── blocked → 429
├── Cache lookup
│ └── hit → serve cached response
└── Upstream request
├── Auth subrequest (if configured)
├── Response body rewriting (if configured)
└── Response to client
```

29
lefthook.yml Normal file
View File

@@ -0,0 +1,29 @@
# Lefthook configuration for Sunbeam Proxy
# Install: lefthook install
commit-msg:
commands:
dco-signoff:
run: |
if ! grep -q "^Signed-off-by: " "{1}"; then
echo ""
echo "ERROR: Missing Signed-off-by line."
echo ""
echo "All commits to Sunbeam Proxy must include a Signed-off-by line"
echo "indicating agreement with the Contributor License Agreement (CLA.md)."
echo ""
echo "Use: git commit -s"
echo " or: git commit --signoff"
echo ""
exit 1
fi
pre-commit:
parallel: true
commands:
cargo-check:
glob: "*.rs"
run: cargo check --quiet
cargo-fmt:
glob: "*.rs"
run: cargo fmt -- --check