docs/proxy.md

# The Bouncer 💎

Every request to The Super Boujee Business Box ✨ comes through one door — a custom reverse proxy built on Cloudflare's [Pingora](https://github.com/cloudflare/pingora) framework, written in Rust. It's the bouncer at the club: checks IDs, spots trouble, and knows exactly which room to send you to.

## Why Not Nginx/Traefik?

We tried the off-the-rack options. They didn't fit.

- We wanted ML-powered threat detection compiled into the binary
- Static file serving that replaces nginx sidecars entirely
- Hot-reload TLS from K8s Secrets (not file watchers)
- ACME challenge routing built-in
- Auth subrequests for admin endpoints
- All in pure Rust with rustls (no BoringSSL dependency)
- It's a few thousand lines of Rust. That's it.

## How Requests Flow

The three-layer security pipeline — every request walks the velvet rope in order:

```
Request → DDoS Detection → Scanner Detection → Rate Limiting → Cache → Backend
```

### 1. DDoS Detection (per-IP)

The bouncer watches your vibe over time. Fourteen behavioral features extracted over a 60-second sliding window:

- **Features:** request rate, path diversity, error rate, burst patterns, cookie/referer presence, suspicious paths, content-length patterns
- **Two-stage ensemble:**
  - Decision tree (fast path, ~2ns inference)
  - MLP fallback if uncertain (~85ns inference)
- **Threshold:** 0.6, minimum 10 events before classifying
- Both models fit in L1 cache (~4KB total)
- Currently in **observe-only mode** (logs decisions, doesn't block)

### 2. Scanner Detection (per-request)

Catching the people who show up with a crowbar. Twelve features analyzed: path structure, headers, user-agent, traversal patterns.

**Hard allowlist for legitimate bots:**
- Known host + cookies → allow (legitimate browser)
- Known host + browser UA + accept-language → allow

**Bot verification:**
- Googlebot: DNS + CIDR validation
- Bingbot: DNS + CIDR validation
- containerd: UA only

Same ensemble architecture as DDoS — decision tree fast path, MLP fallback. If you're a scanner, you get a **403 Forbidden**. Goodbye.

### 3. Rate Limiting (per-identity)

Even welcome guests have limits, darling.

**Identity resolution order:** session cookie > Bearer token > client IP

| Tier | Burst | Sustained |
|------|-------|-----------|
| Authenticated | 200 requests | 50 tokens/sec |
| Unauthenticated | 50 requests | 10 tokens/sec |

- **CIDR bypass:** `10.0.0.0/8`, `127.0.0.0/8`, `::1/128`
- Leaky bucket algorithm, 256 shards for low contention
- Response: **429 + Retry-After header**

### 4. Response Caching

In-memory via pingora-cache. Only requests that pass the security pipeline get cached — blocked requests never touch the cache.

- **Key:** `{host}{path}?{query}`
- Respects `Cache-Control`: no-store, private, s-maxage, max-age, stale-while-revalidate
- Per-route: configurable TTL, enable/disable

## Route Table

Every subdomain gets routed by prefix (the part before the first dot). The bouncer knows every room in the building:

| Prefix | Backend | Notes |
|--------|---------|-------|
| `docs` | Collabora (WOPI) | WebSocket |
| `meet` | Meet frontend + `/api/` → backend | WebSocket |
| `drive` | Drive frontend + `/external_api/` → backend | versioning |
| `mail` | Messages frontend (Caddy) | |
| `messages` | Tuwunel (Matrix) | WebSocket |
| `people` | People frontend + `/api/`, `/admin/` → backend | |
| `find` | Find backend | |
| `src` | Gitea HTTP | WebSocket, SSH on port 22 |
| `auth` | Kratos UI + Hydra OAuth2 | `/.well-known` → Hydra |
| `integration` | La Suite navbar | |
| `cal` | Calendars frontend + `/api/`, `/caldav/` → backend | |
| `projects` | Projects | WebSocket |
| `s3` | SeaweedFS filer | |
| `livekit` | LiveKit dashboard | WebSocket |
| `metrics` | Grafana | |
| `systemmetrics` | Prometheus | |
| `systemlogs` | Loki | |
| `systemtracing` | Tempo | |
| `id` | Kratos admin (auth-gated) | subrequest to `/userinfo` |
| `hydra` | Hydra admin (auth-gated) | subrequest to `/userinfo` |
| `search` | OpenSearch (auth-gated) | subrequest to `/userinfo` |
| `vault` | OpenBao (auth-gated) | subrequest to `/userinfo` |

Path sub-routes use longest-prefix-first matching within each host.

## Auth Request Pattern

For admin endpoints (`id`, `hydra`, `search`, `vault`) — the VIP check:

1. Proxy sends `GET` to Hydra `/userinfo` with original `Cookie`/`Authorization` headers
2. If 2xx: forward to backend (you're in, gorgeous)
3. If non-2xx: **403 Forbidden** (not on the list)

## Static File Serving

Replaces nginx sidecars entirely. No more sidecar containers cluttering up your pods.

- **Try-files chain:** exact → `.html` → `/index.html` → SPA fallback
- Content-Type auto-detection
- Cache headers: `immutable` for hashed assets, `no-cache` for others
- Path traversal protection (`../` rejected)

## TLS

Pure Rust, no C dependencies in the TLS stack.

- **rustls** + **aws-lc-rs** crypto backend
- **K8s Secret watcher:** cert renewal → writes to emptyDir → graceful upgrade (zero downtime)
- **Local:** mkcert wildcard cert
- **Production:** Let's Encrypt via cert-manager (ACME HTTP-01 challenges routed by the proxy itself)

## ML Training Pipeline

The models aren't downloaded — they're compiled into the binary. Weights baked in = zero model file overhead, L1-cache-resident inference.

1. Collect audit logs from production traffic
2. Auto-label with heuristics (request rate, path repetition, error rate thresholds)
3. Merge with external datasets (CSIC 2010, CIC-IDS2017)
4. Train ensemble offline (burn-rs + WGPU GPU acceleration)
5. Export as Rust `const` arrays
6. Recompile binary
7. Deploy + replay logs to validate accuracy

## Metrics

Prometheus endpoint on `:9090/metrics`:

| Metric | Labels |
|--------|--------|
| `sunbeam_requests_total` | method, host, status, backend |
| `sunbeam_request_duration_seconds` | histogram, 1ms–10s buckets |
| `sunbeam_ddos_decisions_total` | |
| `sunbeam_scanner_decisions_total` | |
| `sunbeam_rate_limit_decisions_total` | |
| `sunbeam_cache_status_total` | hit/miss |
| `sunbeam_active_connections` | |

## Audit Logs

Every request produces structured JSON:

- `request_id`, `method`, `host`, `path`, `client_ip`, `status`, `duration_ms`, `user_agent`, `backend`
- These logs are the training data — feed them back into the pipeline to retrain models. The bouncer learns from every shift.

## Deployment

- **Namespace:** `ingress`
- Single replica, `Recreate` strategy
- 256Mi memory limit, 100m CPU
- **ConfigMap:** `pingora-config` (config.toml with all routes + detection config)
- **RBAC:** ServiceAccount with read access to Secrets, ConfigMaps, Ingresses
- **ServiceMonitor** for Prometheus scraping

---

**Source:** [proxy repository](../proxy)
-												docs: add Pingora proxy documentation — The Bouncer 💎

Security pipeline (DDoS, scanner, rate limiting), route table, ML
models, training pipeline, static serving, TLS, auth requests, metrics.

											
										
										
											2026-03-24 11:46:11 +00:00
+								# The Bouncer 💎
 								Every request to The Super Boujee Business Box ✨ comes through one door — a custom reverse proxy built on Cloudflare's [Pingora](https://github.com/cloudflare/pingora) framework, written in Rust. It's the bouncer at the club: checks IDs, spots trouble, and knows exactly which room to send you to.
 								## Why Not Nginx/Traefik?
 								We tried the off-the-rack options. They didn't fit.
 								- We wanted ML-powered threat detection compiled into the binary
 								- Static file serving that replaces nginx sidecars entirely
 								- Hot-reload TLS from K8s Secrets (not file watchers)
 								- ACME challenge routing built-in
 								- Auth subrequests for admin endpoints
 								- All in pure Rust with rustls (no BoringSSL dependency)
 								- It's a few thousand lines of Rust. That's it.
 								## How Requests Flow
 								The three-layer security pipeline — every request walks the velvet rope in order:
 								```
 								Request → DDoS Detection → Scanner Detection → Rate Limiting → Cache → Backend
 								```
 								### 1. DDoS Detection (per-IP)
 								The bouncer watches your vibe over time. Fourteen behavioral features extracted over a 60-second sliding window:
 								- **Features:** request rate, path diversity, error rate, burst patterns, cookie/referer presence, suspicious paths, content-length patterns
 								- **Two-stage ensemble:**
 								  - Decision tree (fast path, ~2ns inference)
 								  - MLP fallback if uncertain (~85ns inference)
 								- **Threshold:** 0.6, minimum 10 events before classifying
 								- Both models fit in L1 cache (~4KB total)
 								- Currently in **observe-only mode** (logs decisions, doesn't block)
 								### 2. Scanner Detection (per-request)
 								Catching the people who show up with a crowbar. Twelve features analyzed: path structure, headers, user-agent, traversal patterns.
 								**Hard allowlist for legitimate bots:**
 								- Known host + cookies → allow (legitimate browser)
 								- Known host + browser UA + accept-language → allow
 								**Bot verification:**
 								- Googlebot: DNS + CIDR validation
 								- Bingbot: DNS + CIDR validation
 								- containerd: UA only
 								Same ensemble architecture as DDoS — decision tree fast path, MLP fallback. If you're a scanner, you get a **403 Forbidden**. Goodbye.
 								### 3. Rate Limiting (per-identity)
 								Even welcome guests have limits, darling.
 								**Identity resolution order:** session cookie > Bearer token > client IP
 								| Tier | Burst | Sustained |
 								|------|-------|-----------|
 								| Authenticated | 200 requests | 50 tokens/sec |
 								| Unauthenticated | 50 requests | 10 tokens/sec |
 								- **CIDR bypass:** `10.0.0.0/8`, `127.0.0.0/8`, `::1/128`
 								- Leaky bucket algorithm, 256 shards for low contention
 								- Response: **429 + Retry-After header**
 								### 4. Response Caching
 								In-memory via pingora-cache. Only requests that pass the security pipeline get cached — blocked requests never touch the cache.
 								- **Key:** `{host}{path}?{query}`
 								- Respects `Cache-Control`: no-store, private, s-maxage, max-age, stale-while-revalidate
 								- Per-route: configurable TTL, enable/disable
 								## Route Table
 								Every subdomain gets routed by prefix (the part before the first dot). The bouncer knows every room in the building:
 								| Prefix | Backend | Notes |
 								|--------|---------|-------|
 								| `docs` | Collabora (WOPI) | WebSocket |
 								| `meet` | Meet frontend + `/api/` → backend | WebSocket |
 								| `drive` | Drive frontend + `/external_api/` → backend | versioning |
 								| `mail` | Messages frontend (Caddy) | |
 								| `messages` | Tuwunel (Matrix) | WebSocket |
 								| `people` | People frontend + `/api/`, `/admin/` → backend | |
 								| `find` | Find backend | |
 								| `src` | Gitea HTTP | WebSocket, SSH on port 22 |
 								| `auth` | Kratos UI + Hydra OAuth2 | `/.well-known` → Hydra |
 								| `integration` | La Suite navbar | |
 								| `cal` | Calendars frontend + `/api/`, `/caldav/` → backend | |
 								| `projects` | Projects | WebSocket |
 								| `s3` | SeaweedFS filer | |
 								| `livekit` | LiveKit dashboard | WebSocket |
 								| `metrics` | Grafana | |
 								| `systemmetrics` | Prometheus | |
 								| `systemlogs` | Loki | |
 								| `systemtracing` | Tempo | |
 								| `id` | Kratos admin (auth-gated) | subrequest to `/userinfo` |
 								| `hydra` | Hydra admin (auth-gated) | subrequest to `/userinfo` |
 								| `search` | OpenSearch (auth-gated) | subrequest to `/userinfo` |
 								| `vault` | OpenBao (auth-gated) | subrequest to `/userinfo` |
 								Path sub-routes use longest-prefix-first matching within each host.
 								## Auth Request Pattern
 								For admin endpoints (`id`, `hydra`, `search`, `vault`) — the VIP check:
 . Proxy sends `GET` to Hydra `/userinfo` with original `Cookie`/`Authorization` headers
 . If 2xx: forward to backend (you're in, gorgeous)
 . If non-2xx: **403 Forbidden** (not on the list)
 								## Static File Serving
 								Replaces nginx sidecars entirely. No more sidecar containers cluttering up your pods.
 								- **Try-files chain:** exact → `.html` → `/index.html` → SPA fallback
 								- Content-Type auto-detection
 								- Cache headers: `immutable` for hashed assets, `no-cache` for others
 								- Path traversal protection (`../` rejected)
 								## TLS
 								Pure Rust, no C dependencies in the TLS stack.
 								- **rustls** + **aws-lc-rs** crypto backend
 								- **K8s Secret watcher:** cert renewal → writes to emptyDir → graceful upgrade (zero downtime)
 								- **Local:** mkcert wildcard cert
 								- **Production:** Let's Encrypt via cert-manager (ACME HTTP-01 challenges routed by the proxy itself)
 								## ML Training Pipeline
 								The models aren't downloaded — they're compiled into the binary. Weights baked in = zero model file overhead, L1-cache-resident inference.
 . Collect audit logs from production traffic
 . Auto-label with heuristics (request rate, path repetition, error rate thresholds)
 . Merge with external datasets (CSIC 2010, CIC-IDS2017)
 . Train ensemble offline (burn-rs + WGPU GPU acceleration)
 . Export as Rust `const` arrays
 . Recompile binary
 . Deploy + replay logs to validate accuracy
 								## Metrics
 								Prometheus endpoint on `:9090/metrics`:
 								| Metric | Labels |
 								|--------|--------|
 								| `sunbeam_requests_total` | method, host, status, backend |
 								| `sunbeam_request_duration_seconds` | histogram, 1ms–10s buckets |
 								| `sunbeam_ddos_decisions_total` | |
 								| `sunbeam_scanner_decisions_total` | |
 								| `sunbeam_rate_limit_decisions_total` | |
 								| `sunbeam_cache_status_total` | hit/miss |
 								| `sunbeam_active_connections` | |
 								## Audit Logs
 								Every request produces structured JSON:
 								- `request_id`, `method`, `host`, `path`, `client_ip`, `status`, `duration_ms`, `user_agent`, `backend`
 								- These logs are the training data — feed them back into the pipeline to retrain models. The bouncer learns from every shift.
 								## Deployment
 								- **Namespace:** `ingress`
 								- Single replica, `Recreate` strategy
 								- 256Mi memory limit, 100m CPU
 								- **ConfigMap:** `pingora-config` (config.toml with all routes + detection config)
 								- **RBAC:** ServiceAccount with read access to Secrets, ConfigMaps, Ingresses
 								- **ServiceMonitor** for Prometheus scraping
 								---
 								**Source:** [proxy repository](../proxy)