Initial import of CSIC 2010 HTTP dataset

Mirror of the CSIC 2010 web application attack dataset by Carmen Torrano
Giménez, Alejandro Pérez Villegas, and Gonzalo Álvarez Marañón (CSIC).
The original hosting at isi.csic.es is no longer available.

Includes the original description paper and all three dataset files
(normal training, normal test, anomalous test).
This commit is contained in:
2026-03-08 19:34:46 +00:00
commit 5cde062493
5 changed files with 1339817 additions and 0 deletions

41
README.md Normal file
View File

@@ -0,0 +1,41 @@
# CSIC 2010 HTTP Dataset
A mirror of the CSIC 2010 HTTP dataset for web application anomaly detection research. The original hosting at `isi.csic.es/dataset/` is no longer available.
## About the dataset
The CSIC 2010 dataset contains thousands of raw HTTP/1.1 requests generated against an e-commerce web application. It was created for evaluating web attack detection systems and includes both normal traffic and anomalous (attack) traffic covering SQL injection, XSS, path traversal, CRLF injection, and other common web attacks.
## Authors
This dataset was created by researchers at the **Information Security Institute** of **CSIC** (Consejo Superior de Investigaciones Científicas — Spanish National Research Council):
- **Carmen Torrano Giménez**
- **Alejandro Pérez Villegas**
- **Gonzalo Álvarez Marañón**
The original description paper is included as [`csic-2010-description.pdf`](csic-2010-description.pdf).
## Files
| File | Requests | Description |
|------|----------|-------------|
| `normalTrafficTraining.txt` | ~36,000 | Normal HTTP requests (training set) |
| `normalTrafficTest.txt` | ~36,000 | Normal HTTP requests (test set) |
| `anomalousTrafficTest.txt` | ~25,000 | Anomalous/attack HTTP requests (test set) |
Each file contains raw HTTP/1.1 requests separated by blank lines.
## Citation
If you use this dataset in your research, please cite the original authors:
> C. Torrano-Gimenez, A. Perez-Villegas, and G. Alvarez, "Application of a Web Attack Detection System to the CSIC 2010 HTTP Dataset," Information Security Institute, CSIC, 2010.
## Fair use notice
This dataset is redistributed under fair use for security research and education purposes. It was originally published as a freely available academic research dataset with no stated license terms. All intellectual property rights remain with the original authors at CSIC. This mirror exists because the original hosting is no longer available.
## License
The contents of this repository (excluding the dataset files and PDF) are provided under [Apache License 2.0](LICENSE).

355776
anomalousTrafficTest.txt Normal file

File diff suppressed because it is too large Load Diff

BIN
csic-2010-description.pdf Normal file

Binary file not shown.

492000
normalTrafficTest.txt Normal file

File diff suppressed because it is too large Load Diff

492000
normalTrafficTraining.txt Normal file

File diff suppressed because it is too large Load Diff