Difference between revisions of "Iarpa"
From Knowitall
(→Quality of Extractions) |
|||
Line 4: | Line 4: | ||
== Quality of Extractions == | == Quality of Extractions == | ||
− | + | === TODO === | |
* Too long relations (7 words) | * Too long relations (7 words) | ||
* some args are just special symbols (e.g., ") | * some args are just special symbols (e.g., ") | ||
+ | * remove relations that break database constraints (i.e. token > 64 characters, these can probably be more strict) | ||
* Pronoun Resolution | * Pronoun Resolution | ||
+ | === DONE === | ||
+ | * Too short relations (2 characters) | ||
== Speed == | == Speed == |
Revision as of 00:05, 11 January 2011
Contents
Domain Recognizers
Rule Learning
Quality of Extractions
TODO
- Too long relations (7 words)
- some args are just special symbols (e.g., ")
- remove relations that break database constraints (i.e. token > 64 characters, these can probably be more strict)
- Pronoun Resolution
DONE
- Too short relations (2 characters)
Speed
- Compile to native code.
- Compare NLP libraries.
Conversion of Docs to Text
- Detect headlines and handle separately
- Sentences with only single letters
- Detect bullet points