Mirror of the CSIC 2010 web application attack dataset by Carmen Torrano Giménez, Alejandro Pérez Villegas, and Gonzalo Álvarez Marañón (CSIC). The original hosting at isi.csic.es is no longer available. Includes the original description paper and all three dataset files (normal training, normal test, anomalous test).
CSIC 2010 HTTP Dataset
A mirror of the CSIC 2010 HTTP dataset for web application anomaly detection research. The original hosting at isi.csic.es/dataset/ is no longer available.
About the dataset
The CSIC 2010 dataset contains thousands of raw HTTP/1.1 requests generated against an e-commerce web application. It was created for evaluating web attack detection systems and includes both normal traffic and anomalous (attack) traffic covering SQL injection, XSS, path traversal, CRLF injection, and other common web attacks.
Authors
This dataset was created by researchers at the Information Security Institute of CSIC (Consejo Superior de Investigaciones Científicas — Spanish National Research Council):
- Carmen Torrano Giménez
- Alejandro Pérez Villegas
- Gonzalo Álvarez Marañón
The original description paper is included as csic-2010-description.pdf.
Files
| File | Requests | Description |
|---|---|---|
normalTrafficTraining.txt |
~36,000 | Normal HTTP requests (training set) |
normalTrafficTest.txt |
~36,000 | Normal HTTP requests (test set) |
anomalousTrafficTest.txt |
~25,000 | Anomalous/attack HTTP requests (test set) |
Each file contains raw HTTP/1.1 requests separated by blank lines.
Citation
If you use this dataset in your research, please cite the original authors:
C. Torrano-Gimenez, A. Perez-Villegas, and G. Alvarez, "Application of a Web Attack Detection System to the CSIC 2010 HTTP Dataset," Information Security Institute, CSIC, 2010.
Fair use notice
This dataset is redistributed under fair use for security research and education purposes. It was originally published as a freely available academic research dataset with no stated license terms. All intellectual property rights remain with the original authors at CSIC. This mirror exists because the original hosting is no longer available.
License
The contents of this repository (excluding the dataset files and PDF) are provided under Apache License 2.0.