Difference between revisions of "Pattern Learning"
From Knowitall
(→Reducing the lemma grep results) |
|||
Line 11: | Line 11: | ||
== Reducing the lemma grep results == | == Reducing the lemma grep results == | ||
+ | # Remove patterns that occur less than 5 times. | ||
# Remove duplicate sentences. | # Remove duplicate sentences. | ||
# Remove extractions that have an (extraction, pattern) pairs that occurs anomalously frequently. | # Remove extractions that have an (extraction, pattern) pairs that occurs anomalously frequently. | ||
## There was a single one: (hotel reservation, be make, online) ocurred 32k times, the next one ocurred 8k times | ## There was a single one: (hotel reservation, be make, online) ocurred 32k times, the next one ocurred 8k times |
Revision as of 19:56, 20 October 2011
Contents
Building the boostrapping data
Determining target relations
- Restrict high quality set of ClueWeb extractions to have proper noun arguments
- Choose the most frequent relations from this set
Determining target extractions
- Measure the occurrence of the arguments
- Keep extractions from the target relations that have arguments that occur commonly (100)
Reducing the lemma grep results
- Remove patterns that occur less than 5 times.
- Remove duplicate sentences.
- Remove extractions that have an (extraction, pattern) pairs that occurs anomalously frequently.
- There was a single one: (hotel reservation, be make, online) ocurred 32k times, the next one ocurred 8k times