Difference between revisions of "Vulcan/TextualEvidenceFinder"
From Knowitall
(→Solr/Lucence Layer) |
|||
Line 48: | Line 48: | ||
<ol> | <ol> | ||
<li>What corpora will be indexed? [Study guide, definitions and sentences covering glossary terms]</li> | <li>What corpora will be indexed? [Study guide, definitions and sentences covering glossary terms]</li> | ||
− | <li>What | + | <li>What is the index structure? [Each tuple will be a Lucene document?]</li> |
<li>What is in a document? [Arg1, Arg1 Norm, Rel, Rel Norm, ...] </li> | <li>What is in a document? [Arg1, Arg1 Norm, Rel, Rel Norm, ...] </li> | ||
<li> ...</li> | <li> ...</li> |
Revision as of 16:46, 23 August 2013
I/O
Input: A Proposition [A natural language sentence + Open IE tuples from the sentence.]
Output: A list of query/score pairs representing evidence for the proposition.
Components
Query Generator
The query generator outputs two types of queries for each proposition:
- Keyword queries -- Extract keywords from the query sentence [TBD: Stemming? Stopword removal?]
- Template queries -- A template query is simply a tuple (or the sentence) where one or more words in the tuple is replaced with a wild-card operator.
The system will be given a set of rules that specify how to convert a tuple into different template queries. Start with two rules:
- Keyword queries -- Remove stopwords.
- Template queries -- Take each tuple. For each field (arg1, rel and arg2), if it is a multi-word query replace each word with a wild-card.
- Examples
Input: Sentence: Iron nail is a good conductor of electricity Tuples: (iron nail, is a good conductor of, electricity) Output: Q1: (iron *, is a good conductor of, electricity) //Template query Q2: (* nail, is a good conductor of, electricity)//Template query Q3: (iron nail, is a * conductor of, electricity) //Template query Q4: iron * conductor * electricity //Template query Q5: iron or conductor or electricity //Keyword query
Solr/Lucence Layer
To be filled in...
- Index
- What corpora will be indexed? [Study guide, definitions and sentences covering glossary terms]
- What is the index structure? [Each tuple will be a Lucene document?]
- What is in a document? [Arg1, Arg1 Norm, Rel, Rel Norm, ...]
- ...
- Search
- How to find tuples that match template queries? Use bag-of-words search and filter tuples that match filter?
- How can we use synonyms or other paraphrase resources?
Open IE 4.0
Use Open IE 4.0.