Document-level Open IE

From Knowitall
Revision as of 21:54, 4 November 2013 by Rbart (talk | contribs)

Jump to: navigation, search

Goals

  • Extend sentence-based Open IE extractors to incorporate document-level reasoning, such as:
    • Coreference
    • Entity Linking
    • NER
    • Rules implemented for TAC 2013 Entity Linking
  • Define necessary data structures and interfaces by Oct-9
  • End-to-end system evaluation by Nov-11

Work Log

11-4

  • Released system output for evaluation:
    • "Rules" configuration, using rule-based best-mention disambiguation, NO Coref.
    • "Coref" configuration, using coref-assisted rule-based best-mention disambiguation. Entity Linking context also extended via coreference.
    • Entity Linking output, showing differences in Entitylinks between each system configuration (and baseline).

Next: Stephen, John and I will annotate the output and analyze performance.

10-25

Met with Stephen, John, and Michael. Items:

  • Create a (very simple) webapp for doc extractor
  • Cleanup arguments before submitting them to the linker.
  • Replace best-mention substrings rather than substituting best mentions for the entire argument.
  • Reformat evaluation output to show only extractions that have been annotated with additional info (diff)
  • Evaluate difference in linker performance with/without document-level info.

10-18

Met with Stephen and John. Discussed:

  • Evaluation systems:
    • Baseline sentence extractor with entity linker, no coreference
    • Full system with best-mention finding rules
    • Full system without coreference.
  • Evaluation data:
    • Sample of 20-30 documents from TAC 2013.
    • Moving away from QA/Query based approach, since the queries/questions will bias evaluation of the document extractor.
    • Instead, we will evaluate all (or a uniform sample) of extractions.
  • Evaluation criteria:
    • Extractions "correct" if their arguments are as unambiguous as possible given the document text.
    • Measure prec/yield using this metric and compare systems.

10-17

Completed: Integrated sentence-level Open IE and Freebase Linker, test run OK.

Next Goals:

  • Integrate best-mention finding rules.
    • First: Drop in code "as-is"
    • After: Factor out NER tagging, coref components
  • Fix issues with tracking character offsets
    • Offsets are not properly computed for Open IE extractions
    • Find a good way for retrieving document metadata by character offset.

10-9

Short term goal - define necessary interfaces and data structures by 10-11

  • Implemented interfaces for:
    • Document
    • Sentence
    • Extraction
    • Argument/Relation
    • Coreference Mention
    • Coreference Cluster
    • Entity Link
  • Discussed interfaces at length with John and Michael
    • Interfaces to be incorporated into generic NLP tool library (nlptools):
      • Document
      • Sentence
      • CorefResolver