Iarpa

From Knowitall
Revision as of 00:05, 11 January 2011 by Schmmd (talk | contribs) (Quality of Extractions)

Jump to: navigation, search

Domain Recognizers

Rule Learning

Quality of Extractions

TODO

  • Too long relations (7 words)
  • some args are just special symbols (e.g., ")
  • remove relations that break database constraints (i.e. token > 64 characters, these can probably be more strict)
  • Pronoun Resolution

DONE

  • Too short relations (2 characters)

Speed

  • Compile to native code.
  • Compare NLP libraries.

Conversion of Docs to Text

  • Detect headlines and handle separately
  • Sentences with only single letters
  • Detect bullet points