Dataset Card for LiveQA Medical from TREC 2017

The LiveQA’17 medical task focuses on consumer health question answering. Consumer health questions were received by the U.S. National Library of Medicine (NLM). The dataset consists of constructed medical question-answer pairs for training and testing, with additional annotations that can be used to develop question analysis and question answering systems.

Please refer to our overview paper for more information about the constructed datasets and the LiveQA Track:

Asma Ben Abacha, Eugene Agichtein, Yuval Pinter & Dina Demner-Fushman. Overview of the Medical Question Answering Task at TREC 2017 LiveQA. TREC, Gaithersburg, MD, 2017 (https://trec.nist.gov/pubs/trec26/papers/Overview-QA.pdf).

Homepage: GitHub - abachaa/LiveQA_MedicalTask_TREC2017: Medical Question-Answering datasets prepared for the TREC 2017 LiveQA challenge (Medical Task)

Medical Training Data

The dataset provides 634 question-answer pairs for training:

1) TREC-2017-LiveQA-Medical-Train-1.xml => 388 question-answer pairs corresponding to 200 NLM questions. 
Each question is divided into one or more subquestion(s). Each subquestion has one or more answer(s). 
These question-answer pairs were constructed automatically and validated manually. 

2) TREC-2017-LiveQA-Medical-Train-2.xml => 246 question-answer pairs corresponding to 246 NLM questions.
Answers were retrieved manually by librarians.

You can access them as jsonl

The datasets are not exhaustive with regards to subquestions, i.e., some subquestions might not be annotated. Additional annotations are provided for both (i) the Focus and (ii) the Question Type used to define each subquestion. 23 question types were considered (e.g. Treatment, Cause, Diagnosis, Indication, Susceptibility, Dosage) related to four focus categories: Disease, Drug, Treatment and Exam.

Medical Test Data

Test split can be easily downloaded via huggingface.

Test questions cover 26 question types associated with five focus categories. Each question includes one or more subquestion(s) and at least one focus and one question type. Reference answers were selected from trusted resources and validated by medical experts. At least one reference answer is provided for each test question, its URL and relevant comments. Question paraphrases were created by assessors and used with the reference answers to judge the participants’ answers.

