Difference between revisions of "Rule Learner/Overview"
From Knowitall
(→Beam Search) |
|||
Line 17: | Line 17: | ||
== Learning rules == | == Learning rules == | ||
=== Beam Search === | === Beam Search === | ||
− | For each base rule <math>r_0</math> | + | For each base rule <math>r</math>: |
+ | # Initialize a beam set <math>B</math> as an empty priority queue with beam size <math>k=10</math> | ||
+ | # Add <math>r_0</math> to the beam set. | ||
+ | # Do while the beam changes | ||
+ | ## For each rule <math>r\in B</math> consider all generalizations <math>r'</math> | ||
+ | ## Compute the confidence for <math>r'</math> | ||
+ | ## If <math>B</math> is not full, add <math>r'</math> to <math>B</math> | ||
+ | ## If <math>B</math> is full but <math>conf(r')</math> is greater than the minimum confidence in <math>B</math>, add <math>r'</math> to <math>B</math> | ||
+ | # Select the highest confidence rules in the beam set as the best rules |
Revision as of 22:04, 20 April 2011
Creating annotated sentence files
The input is an XML file with the sentences and a annotations file. For each Annotation in the annotations file:
- Find the Sentences that matches the annotation sentence.
- Add that annotation to the AnnotatedSentence
Output the AnnotatedSentences to an XML file using toXmlElement.
Creating base rules
For each AnnotedSentence:
- For each extractions (tuples) in the AnnotatedSentence.
- Make constraints that capture the annotation's argument from the extraction. A base rule must have at least one argument constraint found in arg1, pred, or arg2 of the extraction. There may be multiple types that capture the annotation's argument, so there may be multiple possible base rules for the same extraction. If all the text in the extraction part (arg1, predicate, arg2) matches the Annotation's argument, use a PartConstraint.
- Add all possible additional constraints.
- Add all type (class and NER) constraints from arg1, pred, and arg2 of the tuple. Add a term constraint if the text under the type matches the annotation.
- Add term constraints for words with pos tag { IN, TO, POS } in arg1, pred, and arg2.
Learning rules
Beam Search
For each base rule <math>r</math>:
- Initialize a beam set <math>B</math> as an empty priority queue with beam size <math>k=10</math>
- Add <math>r_0</math> to the beam set.
- Do while the beam changes
- For each rule <math>r\in B</math> consider all generalizations <math>r'</math>
- Compute the confidence for <math>r'</math>
- If <math>B</math> is not full, add <math>r'</math> to <math>B</math>
- If <math>B</math> is full but <math>conf(r')</math> is greater than the minimum confidence in <math>B</math>, add <math>r'</math> to <math>B</math>
- Select the highest confidence rules in the beam set as the best rules