Difference between revisions of "Document-level Open IE"

From Knowitall
Jump to: navigation, search
Line 9: Line 9:
  
 
== Work Log ==
 
== Work Log ==
 +
=== 10-18 ===
 +
Met with Stephen and John. Discussed:
 +
* Evaluation systems:
 +
** Baseline sentence extractor with entity linker, no coreference
 +
** Full system with best-mention finding rules
 +
** Full system without coreference.
 +
* Evaluation data:
 +
** Sample of 20-30 documents from TAC 2013.
 +
** Moving away from QA/Query based approach, since the queries/questions will bias evaluation of the document extractor.
 +
** Instead, we will evaluate all (or a uniform sample) of extractions.
 +
* Evaluation criteria:
 +
** Extractions "correct" if their arguments are as unambiguous as possible given the document text.
 +
** Measure prec/yield using this metric and compare systems.
 +
 
=== 10-17 ===
 
=== 10-17 ===
 
Completed: Integrated sentence-level Open IE and Freebase Linker, test run OK.
 
Completed: Integrated sentence-level Open IE and Freebase Linker, test run OK.

Revision as of 22:33, 18 October 2013

Goals

  • Extend sentence-based Open IE extractors to incorporate document-level reasoning, such as:
    • Coreference
    • Entity Linking
    • NER
    • Rules implemented for TAC 2013 Entity Linking
  • Define necessary data structures and interfaces by Oct-9
  • End-to-end system evaluation by Nov-11

Work Log

10-18

Met with Stephen and John. Discussed:

  • Evaluation systems:
    • Baseline sentence extractor with entity linker, no coreference
    • Full system with best-mention finding rules
    • Full system without coreference.
  • Evaluation data:
    • Sample of 20-30 documents from TAC 2013.
    • Moving away from QA/Query based approach, since the queries/questions will bias evaluation of the document extractor.
    • Instead, we will evaluate all (or a uniform sample) of extractions.
  • Evaluation criteria:
    • Extractions "correct" if their arguments are as unambiguous as possible given the document text.
    • Measure prec/yield using this metric and compare systems.

10-17

Completed: Integrated sentence-level Open IE and Freebase Linker, test run OK.

Next Goals:

  • Integrate best-mention finding rules.
    • First: Drop in code "as-is"
    • After: Factor out NER tagging, coref components
  • Fix issues with tracking character offsets
    • Offsets are not properly computed for Open IE extractions
    • Find a good way for retrieving document metadata by character offset.

10-9

Short term goal - define necessary interfaces and data structures by 10-11

  • Implemented interfaces for:
    • Document
    • Sentence
    • Extraction
    • Argument/Relation
    • Coreference Mention
    • Coreference Cluster
    • Entity Link
  • Discussed interfaces at length with John and Michael
    • Interfaces to be incorporated into generic NLP tool library (nlptools):
      • Document
      • Sentence
      • CorefResolver