Rule Learner/Overview

From Knowitall
Jump to: navigation, search

Creating annotated sentence files

The input is an XML file with the sentences and a annotations file. For each Annotation in the annotations file:

  1. Find the Sentences that matches the annotation sentence.
  2. Add that annotation to the AnnotatedSentence

Output the AnnotatedSentences to an XML file using toXmlElement.

Creating base rules

Note, this is a description of one particular method for creating base rules. Base rule creation is determined by the specified BaseRuleFactory.

For each AnnotedSentence:

  1. For each extractions (tuples) in the AnnotatedSentence.
    1. Make constraints that capture the annotation's argument from the extraction. A base rule must have at least one argument constraint found in arg1, pred, or arg2 of the extraction. There may be multiple types that capture the annotation's argument, so there may be multiple possible base rules for the same extraction. If all the text in the extraction part (arg1, predicate, arg2) matches the Annotation's argument, use a PartConstraint.
    2. Add all possible additional constraints.
      1. Add all type (class and NER) constraints from arg1, pred, and arg2 of the tuple. Add a term constraint if the text under the type matches the annotation.
      2. Add term constraints for words with pos tag { IN, TO, POS } in arg1, pred, and arg2.

The result will be a list of Rules.

Learning rules

Beam Search

For each base rule <math>r</math>:

  1. Initialize a beam set <math>B</math> as an empty priority queue with beam size <math>k=10</math>
  2. Add <math>r_0</math> to the beam set.
  3. Do while the beam changes
    1. For each rule <math>r\in B</math> consider all generalizations <math>r'</math> that are valid. Validation is defined by the BaseRuleFactory.
    2. Compute the confidence for <math>r'</math> according to the supplied ConfidenceFunction
    3. If <math>B</math> is not full, add <math>r'</math> to <math>B</math>
    4. If <math>B</math> is full but <math>confidence(r')</math> is greater than the minimum confidence in <math>B</math>, add <math>r'</math> to <math>B</math>
  4. Select the highest confidence rules in the beam set as the best rules