Pattern Learning
From Knowitall
Revision as of 01:20, 20 October 2011 by Schmmd (talk | contribs) (→Reducing the lemma grep results)
Contents
Building the boostrapping data
Determining target relations
- Restrict high quality set of ClueWeb extractions to have proper noun arguments
- Choose the most frequent relations from this set
Determining target extractions
- Measure the occurrence of the arguments
- Keep extractions from the target relations that have arguments that occur commonly (100)
Reducing the lemma grep results
- Remove duplicate sentences.
- Remove (extraction, pattern) pairs that occur anomalously frequently.
* There was a single one: (hotel reservation, be make, online) ocurred 32k times, the next one ocurred 8k times