Difference between revisions of "Iarpa"
From Knowitall
(→Conversion of Docs to Text) |
|||
Line 16: | Line 16: | ||
== Conversion of Docs to Text == | == Conversion of Docs to Text == | ||
+ | * Capitalized sentences such as "Nuclear Material Seized" | ||
* Detect headlines and handle separately | * Detect headlines and handle separately | ||
* Sentences with only single letters | * Sentences with only single letters | ||
* Detect bullet points | * Detect bullet points |
Revision as of 21:08, 6 April 2011
Contents
TODO
Engineering Tasks
- Handle capital sentences.
- Handle geotagging and other oddities of the classified data.
- Address efficiency issues.
Research Tasks
- Pronoun Resolution
Tools
Speed
- Compile to native code.
- Compare NLP libraries.
Conversion of Docs to Text
- Capitalized sentences such as "Nuclear Material Seized"
- Detect headlines and handle separately
- Sentences with only single letters
- Detect bullet points