Iarpa
From Knowitall
Contents
Domain Recognizers
Rule Learning
Quality of Extractions
TODO
- Too long relations (7 words)
- some args are just special symbols (e.g., ")
- remove relations that break database constraints (i.e. token > 64 characters, these can probably be more strict)
- Pronoun Resolution
DONE
- Too short relations (2 characters)
Tools
Speed
- Compile to native code.
- Compare NLP libraries.
Conversion of Docs to Text
- Detect headlines and handle separately
- Sentences with only single letters
- Detect bullet points