Unified prepare-dataset pipeline that automatically downloads and caches
upstream datasets (CSIC 2010, CIC-IDS2017), applies heuristic auto-labeling
to unlabeled production logs, generates synthetic samples for both models,
and serializes everything as a bincode DatasetManifest. Includes OWASP
ModSec parser, CIC-IDS2017 timing profile extractor, and synthetic data
generators with configurable distributions.
Signed-off-by: Sienna Meridian Satterwhite <sienna@sunbeam.pt>