QA-SRL: Question-Answer Driven Semantic Role Labeling
Use Natural Language to Annotate Natural Language

We use question-answer pairs to model verbal predicate-argument structure. The questions start with wh-words (Who, What, Where, What, etc.) and contains a verb predicate in the sentence; the answers are phrases in the sentence. For example:

UCD finished the 2006 championship as Dublin champions , by beating St Vincents in the final .
finished Who finished something? UCD
What did someone finish? the 2006 championship
What did someone finish something as? Dublin champions
How did someone finish something? by beating St Vincents in the final
beating Who beat someone? UCD
When did someone beat someone? in the final
Who did someone beat? St Vincents

Human-in-the-Loop Parsing

Coming soon!


The QA-SRL framework is described in the following paper:

Question-Answer Driven Semantic Role Labeling: Using Natural Language to Annotate Natural Language
Luheng He, Mike Lewis and Luke Zettlemoyer
In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP-2015)

The QA-SRL Dataset

File Format

Dataset No. Sentences No. Verbs No. QAs
newswire-train 744 2020 4904
newswire-dev 249 664 1606
newswire-test 248 652 1599
Wikipedia-train 1174 2647 6414
Wikipedia-dev 392 895 2183
Wikipedia-test 393 898 2201

*The newswire data does not contain the original sentences. You will need to download and run the following python script with the CoNLL-2009 English training data to get the complete data.

Our Annotation Tool

Code for generating the annotation spreadsheets can be found here:


If you have any question about the data or the code, please contact: {first name of first author} at cs dot washington dot edu