Difference between revisions of "Rule Learner/Overview"

From Knowitall
Jump to: navigation, search
 
Line 7: Line 7:
  
 
== Creating base rules ==
 
== Creating base rules ==
 +
 +
Note, this is a description of one particular method for creating base rules.  Base rule creation is determined by the specified [http://knowitall.cs.washington.edu/javadoc/nlp/edu/washington/cs/knowitall/rule/base/BaseRuleFactory.html BaseRuleFactory].
  
 
For each [http://knowitall.cs.washington.edu/javadoc/nlp/edu/washington/cs/knowitall/rule/AnnotatedSentence.html AnnotedSentence]:
 
For each [http://knowitall.cs.washington.edu/javadoc/nlp/edu/washington/cs/knowitall/rule/AnnotatedSentence.html AnnotedSentence]:
Line 14: Line 16:
 
### Add all type (class and NER) constraints from arg1, pred, and arg2 of the tuple.  Add a term constraint if the text under the type matches the annotation.
 
### Add all type (class and NER) constraints from arg1, pred, and arg2 of the tuple.  Add a term constraint if the text under the type matches the annotation.
 
### Add term constraints for words with pos tag { IN, TO, POS } in arg1, pred, and arg2.
 
### Add term constraints for words with pos tag { IN, TO, POS } in arg1, pred, and arg2.
 +
The result will be a list of [http://knowitall.cs.washington.edu/javadoc/nlp/edu/washington/cs/knowitall/rule/Rule.html Rule]s.
  
 
== Learning rules ==
 
== Learning rules ==
Line 21: Line 24:
 
# Add <math>r_0</math> to the beam set.
 
# Add <math>r_0</math> to the beam set.
 
# Do while the beam changes
 
# Do while the beam changes
## For each rule <math>r\in B</math> consider all generalizations <math>r'</math>
+
## For each rule <math>r\in B</math> consider all generalizations <math>r'</math> that are valid.  Validation is defined by the [http://knowitall.cs.washington.edu/javadoc/nlp/edu/washington/cs/knowitall/rule/base/BaseRuleFactory.html#validate%28java.util.List%29 BaseRuleFactory].
## Compute the confidence for <math>r'</math>
+
## Compute the confidence for <math>r'</math> according to the supplied [http://knowitall.cs.washington.edu/javadoc/nlp/edu/washington/cs/knowitall/rule/confidence/ConfidenceFunction.html ConfidenceFunction]
 
## If <math>B</math> is not full, add <math>r'</math> to <math>B</math>
 
## If <math>B</math> is not full, add <math>r'</math> to <math>B</math>
 
## If <math>B</math> is full but <math>confidence(r')</math> is greater than the minimum confidence in <math>B</math>, add <math>r'</math> to <math>B</math>
 
## If <math>B</math> is full but <math>confidence(r')</math> is greater than the minimum confidence in <math>B</math>, add <math>r'</math> to <math>B</math>
 
# Select the highest confidence rules in the beam set as the best rules
 
# Select the highest confidence rules in the beam set as the best rules

Latest revision as of 22:55, 20 April 2011

Creating annotated sentence files

The input is an XML file with the sentences and a annotations file. For each Annotation in the annotations file:

  1. Find the Sentences that matches the annotation sentence.
  2. Add that annotation to the AnnotatedSentence

Output the AnnotatedSentences to an XML file using toXmlElement.

Creating base rules

Note, this is a description of one particular method for creating base rules. Base rule creation is determined by the specified BaseRuleFactory.

For each AnnotedSentence:

  1. For each extractions (tuples) in the AnnotatedSentence.
    1. Make constraints that capture the annotation's argument from the extraction. A base rule must have at least one argument constraint found in arg1, pred, or arg2 of the extraction. There may be multiple types that capture the annotation's argument, so there may be multiple possible base rules for the same extraction. If all the text in the extraction part (arg1, predicate, arg2) matches the Annotation's argument, use a PartConstraint.
    2. Add all possible additional constraints.
      1. Add all type (class and NER) constraints from arg1, pred, and arg2 of the tuple. Add a term constraint if the text under the type matches the annotation.
      2. Add term constraints for words with pos tag { IN, TO, POS } in arg1, pred, and arg2.

The result will be a list of Rules.

Learning rules

Beam Search

For each base rule <math>r</math>:

  1. Initialize a beam set <math>B</math> as an empty priority queue with beam size <math>k=10</math>
  2. Add <math>r_0</math> to the beam set.
  3. Do while the beam changes
    1. For each rule <math>r\in B</math> consider all generalizations <math>r'</math> that are valid. Validation is defined by the BaseRuleFactory.
    2. Compute the confidence for <math>r'</math> according to the supplied ConfidenceFunction
    3. If <math>B</math> is not full, add <math>r'</math> to <math>B</math>
    4. If <math>B</math> is full but <math>confidence(r')</math> is greater than the minimum confidence in <math>B</math>, add <math>r'</math> to <math>B</math>
  4. Select the highest confidence rules in the beam set as the best rules