Difference between revisions of "Pattern Learning"

From Knowitall
Jump to: navigation, search
(Created page with "= Building the boostrapping data = == Determining target relations == # Restrict high quality set of ClueWeb extractions to have proper noun arguments # Choose the most freque...")
 
(Reducing the lemma grep results)
Line 12: Line 12:
 
== Reducing the lemma grep results ==
 
== Reducing the lemma grep results ==
 
#  Remove duplicate sentences.
 
#  Remove duplicate sentences.
#  Remove extractions that occur anomalously frequently.
+
#  Remove (extraction, pattern) pairs that occur anomalously frequently.
 +
* There was a single one: (hotel reservation, be make, online) ocurred 32k times, the next one ocurred 8k times

Revision as of 01:20, 20 October 2011

Building the boostrapping data

Determining target relations

  1. Restrict high quality set of ClueWeb extractions to have proper noun arguments
  2. Choose the most frequent relations from this set

Determining target extractions

  1. Measure the occurrence of the arguments
  2. Keep extractions from the target relations that have arguments that occur commonly (100)

Reducing the lemma grep results

  1. Remove duplicate sentences.
  2. Remove (extraction, pattern) pairs that occur anomalously frequently.
* There was a single one: (hotel reservation, be make, online) ocurred 32k times, the next one ocurred 8k times