Tagger

From Knowitall
Jump to: navigation, search

The Tagger classes search for content in a sentence and mark it with a Type. Taggers are responsible for deserializing their content XML, however the descriptor is a field that is common to all taggers. The descriptor contains a string that names the tagger. For example, a tagger that looks for "knife", "sword", and "gun" might have the descriptor "AttackWeapon". Maybe I should have just named this "name".

Here is an example of a simple tagger.

   <CaseInsensitiveKeywordTagger descriptor="WaterVehicle">
       <constraint type="NounPhraseConstraint" />
       <keywords>
           <keyword>watercraft</keyword>
           <keyword>tugcraft</keyword>
           <keyword>tanker</keyword>
           <keyword>yacht</keyword>
       </keywords>
   </CaseInsensitiveKeywordTagger>

This will search for the keywords, ignoring case, and tag matches. There is a further constraint that text to be tagged must be in a noun phrase (as defined by OpenNLP). What if we wanted to match phrases like "yachts" and "tugcrafts"? Then we would use a NormalizedKeywordTagger.

   <NormalizedKeywordTagger descriptor="WaterVehicle">
       <constraint type="NounPhraseConstraint" />
       <keywords>
           <keyword>watercraft</keyword>
           <keyword>tugcraft</keyword>
           <keyword>tanker</keyword>
           <keyword>yacht</keyword>
       </keywords>
   </NormalizedKeywordTagger>

There are many different types of taggers. For a complete list see the javadoc [1].