Annotations for each sentence are separated by empty lines.
For each sentence, the data starts with a line containing the sentence identifier, and the number of verbal predicates n, separated by white spaces. For example:

WIKI1_0     1

The next line contains the tokenized sentence.

Following that is the information about the n predicates. For each predicate, the data starts with a line containing the index of the predicate (its position in the sentence), the predicate token, and the number of QA pairs m. For example:

6   row 2

Following that are m lines of data, each contains a question-answer pair. The question always consists of seven slots, as defined in the paper. The empty slots are represented with a marker “_”. The question ends with question mark. The answers are listed after the question mark. If there are multiple answers to a single question, they are separated by the marker “###”. Here are two examples:

what    _   _   rows    _   _   _   ?   four boat clubs ### Aberdeen Boat Club ### Aberdeen Schools Rowing Association ### Aberdeen University Boat Club ### Robert Gordon University Boat Club
where   does    something   row _   _   _   ?   on the River Dee

Note that while each answer can be aligned with a sub-span in the sentence, the annotators did not specify the exact position of the answer span. Therefore, some very short answers, such as “he”, could possibly be aligned with multiple words in the sentence. In the paper, we associate the answer spans with all possible alignment. An alternative is the align the answer to the span that’s closest to the predicate.