Revision as of 00:05, 25 October 2011

Building the boostrapping data

Start with the clean, chunked dataset of ReVerb extractions from ClueWeb.
Apply Jonathan Berant's relation string normalization.
Filter relations so each relation's normalized relation string matches a target relation and the arguments only contain DT, NNP, and NNPS.
Measure the occurrence of the arguments
Keep extractions from the target relations that have arguments that occur commonly (100)

Remove patterns that occur less than 5 times.
Remove duplicate sentences.
Remove extractions that have an (extraction, pattern) pairs that occurs anomalously frequently.
1. There was a single one: (hotel reservation, be make, online) ocurred 32k times, the next one ocurred 8k times

@@ Line 7: / Line 7: @@
 == Determining target extractions ==
+#  Start with the clean, chunked dataset of ReVerb extractions from ClueWeb.
+#  Apply Jonathan Berant's relation string normalization.
+#  Filter relations so each relation's normalized relation string matches a target relation and the arguments only contain DT, NNP, and NNPS.
 #  Measure the occurrence of the arguments
 #  Keep extractions from the target relations that have arguments that occur commonly (100)