Difference between revisions of "Document-level Open IE"
From Knowitall
Line 9: | Line 9: | ||
== Work Log == | == Work Log == | ||
+ | === 10-25 === | ||
+ | Met with Stephen, John, and Michael. Items: | ||
+ | * Create a (very simple) webapp for doc extractor | ||
+ | * Cleanup arguments before submitting them to the linker. | ||
+ | * Replace best-mention substrings rather than substituting best mentions for the entire argument. | ||
+ | * Reformat evaluation output to show only extractions that have been annotated with additional info (diff) | ||
+ | * Evaluate difference in linker performance with/without document-level info. | ||
+ | |||
=== 10-18 === | === 10-18 === | ||
Met with Stephen and John. Discussed: | Met with Stephen and John. Discussed: |
Revision as of 21:48, 25 October 2013
Goals
- Extend sentence-based Open IE extractors to incorporate document-level reasoning, such as:
- Coreference
- Entity Linking
- NER
- Rules implemented for TAC 2013 Entity Linking
- Define necessary data structures and interfaces by Oct-9
- End-to-end system evaluation by Nov-11
Work Log
10-25
Met with Stephen, John, and Michael. Items:
- Create a (very simple) webapp for doc extractor
- Cleanup arguments before submitting them to the linker.
- Replace best-mention substrings rather than substituting best mentions for the entire argument.
- Reformat evaluation output to show only extractions that have been annotated with additional info (diff)
- Evaluate difference in linker performance with/without document-level info.
10-18
Met with Stephen and John. Discussed:
- Evaluation systems:
- Baseline sentence extractor with entity linker, no coreference
- Full system with best-mention finding rules
- Full system without coreference.
- Evaluation data:
- Sample of 20-30 documents from TAC 2013.
- Moving away from QA/Query based approach, since the queries/questions will bias evaluation of the document extractor.
- Instead, we will evaluate all (or a uniform sample) of extractions.
- Evaluation criteria:
- Extractions "correct" if their arguments are as unambiguous as possible given the document text.
- Measure prec/yield using this metric and compare systems.
10-17
Completed: Integrated sentence-level Open IE and Freebase Linker, test run OK.
Next Goals:
- Integrate best-mention finding rules.
- First: Drop in code "as-is"
- After: Factor out NER tagging, coref components
- Fix issues with tracking character offsets
- Offsets are not properly computed for Open IE extractions
- Find a good way for retrieving document metadata by character offset.
10-9
Short term goal - define necessary interfaces and data structures by 10-11
- Implemented interfaces for:
- Document
- Sentence
- Extraction
- Argument/Relation
- Coreference Mention
- Coreference Cluster
- Entity Link
- Discussed interfaces at length with John and Michael
- Interfaces to be incorporated into generic NLP tool library (nlptools):
- Document
- Sentence
- CorefResolver
- Interfaces to be incorporated into generic NLP tool library (nlptools):