Difference between revisions of "Iarpa"

From Knowitall
Jump to: navigation, search
(Engineering Tasks)
(Conversion of Docs to Text)
Line 17: Line 17:
  
 
== Conversion of Docs to Text ==
 
== Conversion of Docs to Text ==
* Capitalized sentences such as "Nuclear Material Seized"
 
 
* Detect headlines and handle separately
 
* Detect headlines and handle separately
 
* Sentences with only single letters  
 
* Sentences with only single letters  
 
* Detect bullet points
 
* Detect bullet points

Revision as of 21:12, 6 April 2011

TODO

Engineering Tasks

  • Handle capitalized/allcaps/nocap sentences.
  • Oddity of data where person lastname is in parentheses.
  • Handle geotagging and other oddities of the classified data.
  • Address efficiency issues.

Research Tasks

  • Pronoun Resolution

Tools

Speed

  • Compile to native code.
  • Compare NLP libraries.

Conversion of Docs to Text

  • Detect headlines and handle separately
  • Sentences with only single letters
  • Detect bullet points