Sunlab / patch_db

PatchDB: A Large-Scale Security Patch Dataset


To foster large-scale research on vulnerability mitigation and to enable a comparison of different detection approaches, we make our dataset PatchDB from our DSN’21 paper publicly available.

PatchDB is a large-scale security patch dataset that contains around 12,073 security patches and 23,742 non-security patches from the real world. You can find more details on the dataset in the paper PatchDB: A Large-Scale Security Patch Dataset. You can also visit our PatchDB official website for more information.

Please use your work emails to request for the dataset. Typically, it takes no longer than 24 hours to get approval.

Data Structure

PatchDB is stored in json format, where each sample contains 9 keys and has the following format.

  "category": the type of patch, value:"security" or "non-security",
  "source": the source of patch, value: "cve" or "wild",
  "CVE_ID": the CVE ID if it exists, value: "CVE-XXXX-XXXXX" or "NA",
  "CWE_ID": the CWE ID if it exists, value: "cwe_id" or "NA"
  "commit_id": the hash value of the commit, type: str,
  "owner": the owner id of the repository, type: str,
  "repo": the repository id, type: str,
  "commit_message": the commit message part of the patch, type: str,
  "diff_code": the diff code part of the patch, type: str

Disclaimer & Download Agreement

To download the PatchDB dataset, you must agree with the items of the succeeding Disclaimer & Download Agreement. You should carefully read the following terms before submitting the PatchDB request form.

  • PatchDB is constructed and cross-checked by 3 experts that work in security patch research. Due to the potential misclassification led by subjective factors, the Sun Security Laboratory (SunLab) cannot guarantee a 100% accuracy for samples in the dataset.
  • The copyright of the PatchDB dataset is owned by SunLab.
  • The purpose of using PatchDB should be non-commercial research and/or personal use. The dataset should not be used for commercial use and any profitable purpose.
  • The PatchDB dataset should not be re-selled or re-distributed. Anyone who has obtained PatchDB should not share the dataset with others without the permission from SunLab.


The PatchDB dataset is built by Sun Security Laboratory (SunLab) at George Mason University, Fairfax, VA.SunLab Logo


  author={Wang, Xinda and Wang, Shu and Feng, Pengbin and Sun, Kun and Jajodia, Sushil},
  booktitle={2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)}, 
  title={PatchDB: A Large-Scale Security Patch Dataset},