Difference between revisions of "Iarpa"
From Knowitall
(→Quality of Extractions) |
|||
Line 11: | Line 11: | ||
=== DONE === | === DONE === | ||
* Too short relations (2 characters) | * Too short relations (2 characters) | ||
+ | |||
+ | == Tools == | ||
+ | * [[IARPA/Pattern]] | ||
== Speed == | == Speed == |
Revision as of 21:38, 14 March 2011
Contents
Domain Recognizers
Rule Learning
Quality of Extractions
TODO
- Too long relations (7 words)
- some args are just special symbols (e.g., ")
- remove relations that break database constraints (i.e. token > 64 characters, these can probably be more strict)
- Pronoun Resolution
DONE
- Too short relations (2 characters)
Tools
Speed
- Compile to native code.
- Compare NLP libraries.
Conversion of Docs to Text
- Detect headlines and handle separately
- Sentences with only single letters
- Detect bullet points