feat: configurable k8s resources, CSIC training pipeline, unified Dockerfile

- Make K8s namespace, TLS secret, and config ConfigMap names configurable
  via [kubernetes] config section (previously hardcoded to "ingress")
- Add CSIC 2010 dataset converter and auto-download for scanner training
- Unify Dockerfile for local and production builds (remove cross-compile path)
- Bake ML models directory into container image
- Update CSIC dataset URL to self-hosted mirror (src.sunbeam.pt)
- Fix rate_limit pipeline log missing fields
- Consolidate docs/README.md into root README.md

Signed-off-by: Sienna Meridian Satterwhite <sienna@sunbeam.pt>
This commit is contained in:
2026-03-10 23:38:20 +00:00
parent 0baab92141
commit a5810dd8a7
23 changed files with 946 additions and 514 deletions

View File

@@ -7,7 +7,7 @@ Label is determined by which file it came from (normal vs anomalous).
Usage:
# Download the dataset first:
git clone https://github.com/msudol/Web-Application-Attack-Datasets.git /tmp/csic
git clone https://src.sunbeam.pt/studio/csic-dataset.git /tmp/csic
# Convert all three files:
python3 scripts/convert_csic.py \
@@ -20,8 +20,9 @@ Usage:
# Merge with production logs:
cat logs.jsonl csic_converted.jsonl > combined.jsonl
# Train:
# Train (or just use --csic flag which does this automatically):
cargo run -- train-scanner --input combined.jsonl --output scanner_model.bin
# Simpler: cargo run -- train-scanner --input logs.jsonl --output scanner_model.bin --csic
"""
import argparse