Difference between revisions of "Iarpa"

From Knowitall
Jump to: navigation, search
(Quality of Extractions)
Line 11: Line 11:
 
=== DONE ===
 
=== DONE ===
 
* Too short relations (2 characters)
 
* Too short relations (2 characters)
 +
 +
== Tools ==
 +
* [[IARPA/Pattern]]
  
 
== Speed ==
 
== Speed ==

Revision as of 21:38, 14 March 2011

Domain Recognizers

Rule Learning

Quality of Extractions

TODO

  • Too long relations (7 words)
  • some args are just special symbols (e.g., ")
  • remove relations that break database constraints (i.e. token > 64 characters, these can probably be more strict)
  • Pronoun Resolution

DONE

  • Too short relations (2 characters)

Tools

Speed

  • Compile to native code.
  • Compare NLP libraries.

Conversion of Docs to Text

  • Detect headlines and handle separately
  • Sentences with only single letters
  • Detect bullet points