Difference between revisions of "Document-level Open IE"

From Knowitall
Jump to: navigation, search
 
(7 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
** NER
 
** NER
 
** Rules implemented for TAC 2013 Entity Linking
 
** Rules implemented for TAC 2013 Entity Linking
* Define necessary data structures and interfaces by Oct-9
+
* Define necessary data structures and interfaces by Oct-9 (done)
* End-to-end system evaluation by Nov-11
+
* Preliminary End-to-end system evaluation by Nov-11 (done)
 +
* Quantitatively determine how much this adds to present Open IE
  
 
== Work Log ==
 
== Work Log ==
 +
=== 11-22 ===
 +
Trained and evaluated a linear classifier for best-mentions (instances of a rule-application or best-mention-resolution), which provides 95% precision at 90% yield over news data.
 +
Features include rule type (person, organization, location), whether coreference info was used, and the ambiguity of a given mention.
 +
Todo from here:
 +
* Polish features:
 +
** Include coreference info when deciding candidates for a rule (currently coreference is only considered after applying rules)
 +
** Improve ambiguity measures: Return a value that indicates prominence of the chosen mention out of other ambiguous mentions
 +
** Improve location ambiguity measure - fix a technical issue reading tipster gazetteer. Consider city, stateOrProvince, and Country ambiguity separately.
 +
** Debug an issue where names are resolved to something only matching a prefix (e.g. Steven Miller -> Steven Tyler)
 +
* Produce a formal evaluation
 +
** How much does this help "Open IE?"
 +
*** How many extractions get annotated with additional, useful information?
 +
*** How many more links get found as a result of best mentions? Coref? Are they higher confidence links?
 +
*** How much does it increase running time to do Document-level processing with/without Coref?
 +
* Code cleanup, packaging, and release.
 +
 +
 +
 +
=== 11-12 ===
 +
Implemented serialization to allow extractor input to be saved to disk after pre-processing, saving us from redoing with each run:
 +
* Parsing
 +
* Chunking
 +
* Stemming
 +
* Coref
 +
* Sentence-level Open IE
 +
* NER tagging
 +
This saves roughly 3 minutes per run (over 20 docs) and will greatly speed development time.
 +
 +
Started refactoring and cleaning up rules. Next step: get all substitution rules "on equal footing" programatically so that a classifier can be built to rank them.
 +
=== 11-8 ===
 +
Finished annotating data and discussed results. String substitution rules need to be tightened up, and a confidence measure over them would help greatly.
 +
==== Extraction-level stats ====
 +
From 20 documents, there were 528 total extractions in all runs.
 +
 +
Rules-diff:
 +
-- 206 extractions in diff
 +
-- 75 baseline better
 +
-- 99 rule-based system better
 +
-- 33 bad extractions (neither better)
 +
 +
Coref-diff:
 +
-- 280 extractions in diff
 +
-- 105 baseline better
 +
-- 115 coref+rule-based system better
 +
-- 59 bad extractions (neither better)
 +
 +
I took a closer look at the "baseline better" cases to see where we were getting it wrong:
 +
 +
Rule-based system
 +
-- 49 strange string errors e.g. "CDC" -> "CIENCE SLIGHTED IN"
 +
-- 16 location errors (e.g. "Washington" [DC] -> "Washington, Georgia")
 +
-- 8 entity disambiguation errors, e.g. ("he" [Scott Peterson] => "Laci Peterson")
 +
-- 1 incorrect link (e.g. "the theory" linked to "Theory" in FreeBase)
 +
-- 75 total
 +
 +
Coref+rule-based system
 +
-- 49 strange string errors
 +
-- 11 location errors
 +
-- 13 entity disambiguation errors
 +
-- 17 incorrect links
 +
-- 6 coref errors (e.g. "make it clear that" -> "make the CDC clear that")
 +
-- 105 total
 +
 +
Approximate running times over 20 documents:
 +
Baseline: 45 sec
 +
Rules: 45 Sec
 +
Rules+Coref: 230 sec
 +
 +
=== 11-4 ===
 +
* Released system output for evaluation:
 +
** "Rules" configuration, using rule-based best-mention disambiguation, NO Coref.
 +
** "Coref" configuration, using coref-assisted rule-based best-mention disambiguation. Entity Linking context also extended via coreference.
 +
** Entity Linking output, showing differences in Entitylinks between each system configuration (and baseline).
 +
Next: Stephen, John and I will annotate the output and analyze performance.
 +
 +
=== 10-25 ===
 +
Met with Stephen, John, and Michael. Items:
 +
* Create a (very simple) webapp for doc extractor
 +
* Cleanup arguments before submitting them to the linker.
 +
* Replace best-mention substrings rather than substituting best mentions for the entire argument.
 +
* Reformat evaluation output to show only extractions that have been annotated with additional info (diff)
 +
* Evaluate difference in linker performance with/without document-level info.
 +
 +
=== 10-18 ===
 +
Met with Stephen and John. Discussed:
 +
* Evaluation systems:
 +
** Baseline sentence extractor with entity linker, no coreference
 +
** Full system with best-mention finding rules
 +
** Full system without coreference.
 +
* Evaluation data:
 +
** Sample of 20-30 documents from TAC 2013.
 +
** Moving away from QA/Query based approach, since the queries/questions will bias evaluation of the document extractor.
 +
** Instead, we will evaluate all (or a uniform sample) of extractions.
 +
* Evaluation criteria:
 +
** Extractions "correct" if their arguments are as unambiguous as possible given the document text.
 +
** Measure prec/yield using this metric and compare systems.
 +
 
=== 10-17 ===
 
=== 10-17 ===
 
Completed: Integrated sentence-level Open IE and Freebase Linker, test run OK.
 
Completed: Integrated sentence-level Open IE and Freebase Linker, test run OK.

Latest revision as of 22:37, 22 November 2013

Goals

  • Extend sentence-based Open IE extractors to incorporate document-level reasoning, such as:
    • Coreference
    • Entity Linking
    • NER
    • Rules implemented for TAC 2013 Entity Linking
  • Define necessary data structures and interfaces by Oct-9 (done)
  • Preliminary End-to-end system evaluation by Nov-11 (done)
  • Quantitatively determine how much this adds to present Open IE

Work Log

11-22

Trained and evaluated a linear classifier for best-mentions (instances of a rule-application or best-mention-resolution), which provides 95% precision at 90% yield over news data. Features include rule type (person, organization, location), whether coreference info was used, and the ambiguity of a given mention. Todo from here:

  • Polish features:
    • Include coreference info when deciding candidates for a rule (currently coreference is only considered after applying rules)
    • Improve ambiguity measures: Return a value that indicates prominence of the chosen mention out of other ambiguous mentions
    • Improve location ambiguity measure - fix a technical issue reading tipster gazetteer. Consider city, stateOrProvince, and Country ambiguity separately.
    • Debug an issue where names are resolved to something only matching a prefix (e.g. Steven Miller -> Steven Tyler)
  • Produce a formal evaluation
    • How much does this help "Open IE?"
      • How many extractions get annotated with additional, useful information?
      • How many more links get found as a result of best mentions? Coref? Are they higher confidence links?
      • How much does it increase running time to do Document-level processing with/without Coref?
  • Code cleanup, packaging, and release.


11-12

Implemented serialization to allow extractor input to be saved to disk after pre-processing, saving us from redoing with each run:

  • Parsing
  • Chunking
  • Stemming
  • Coref
  • Sentence-level Open IE
  • NER tagging

This saves roughly 3 minutes per run (over 20 docs) and will greatly speed development time.

Started refactoring and cleaning up rules. Next step: get all substitution rules "on equal footing" programatically so that a classifier can be built to rank them.

11-8

Finished annotating data and discussed results. String substitution rules need to be tightened up, and a confidence measure over them would help greatly.

Extraction-level stats

From 20 documents, there were 528 total extractions in all runs.

Rules-diff: -- 206 extractions in diff -- 75 baseline better -- 99 rule-based system better -- 33 bad extractions (neither better)

Coref-diff: -- 280 extractions in diff -- 105 baseline better -- 115 coref+rule-based system better -- 59 bad extractions (neither better)

I took a closer look at the "baseline better" cases to see where we were getting it wrong:

Rule-based system -- 49 strange string errors e.g. "CDC" -> "CIENCE SLIGHTED IN" -- 16 location errors (e.g. "Washington" [DC] -> "Washington, Georgia") -- 8 entity disambiguation errors, e.g. ("he" [Scott Peterson] => "Laci Peterson") -- 1 incorrect link (e.g. "the theory" linked to "Theory" in FreeBase) -- 75 total

Coref+rule-based system -- 49 strange string errors -- 11 location errors -- 13 entity disambiguation errors -- 17 incorrect links -- 6 coref errors (e.g. "make it clear that" -> "make the CDC clear that") -- 105 total

Approximate running times over 20 documents: Baseline: 45 sec Rules: 45 Sec Rules+Coref: 230 sec

11-4

  • Released system output for evaluation:
    • "Rules" configuration, using rule-based best-mention disambiguation, NO Coref.
    • "Coref" configuration, using coref-assisted rule-based best-mention disambiguation. Entity Linking context also extended via coreference.
    • Entity Linking output, showing differences in Entitylinks between each system configuration (and baseline).

Next: Stephen, John and I will annotate the output and analyze performance.

10-25

Met with Stephen, John, and Michael. Items:

  • Create a (very simple) webapp for doc extractor
  • Cleanup arguments before submitting them to the linker.
  • Replace best-mention substrings rather than substituting best mentions for the entire argument.
  • Reformat evaluation output to show only extractions that have been annotated with additional info (diff)
  • Evaluate difference in linker performance with/without document-level info.

10-18

Met with Stephen and John. Discussed:

  • Evaluation systems:
    • Baseline sentence extractor with entity linker, no coreference
    • Full system with best-mention finding rules
    • Full system without coreference.
  • Evaluation data:
    • Sample of 20-30 documents from TAC 2013.
    • Moving away from QA/Query based approach, since the queries/questions will bias evaluation of the document extractor.
    • Instead, we will evaluate all (or a uniform sample) of extractions.
  • Evaluation criteria:
    • Extractions "correct" if their arguments are as unambiguous as possible given the document text.
    • Measure prec/yield using this metric and compare systems.

10-17

Completed: Integrated sentence-level Open IE and Freebase Linker, test run OK.

Next Goals:

  • Integrate best-mention finding rules.
    • First: Drop in code "as-is"
    • After: Factor out NER tagging, coref components
  • Fix issues with tracking character offsets
    • Offsets are not properly computed for Open IE extractions
    • Find a good way for retrieving document metadata by character offset.

10-9

Short term goal - define necessary interfaces and data structures by 10-11

  • Implemented interfaces for:
    • Document
    • Sentence
    • Extraction
    • Argument/Relation
    • Coreference Mention
    • Coreference Cluster
    • Entity Link
  • Discussed interfaces at length with John and Michael
    • Interfaces to be incorporated into generic NLP tool library (nlptools):
      • Document
      • Sentence
      • CorefResolver