Difference between revisions of "Vulcan/TextualEvidenceFinder"

From Knowitall
Jump to: navigation, search
(Solr/Lucence Layer)
(Query Generator)
 
(5 intermediate revisions by 2 users not shown)
Line 7: Line 7:
 
== Components ==
 
== Components ==
  
[[File:weak-evidence-finder.jpg|frame|center|alt=Weak Evidence Finder Details|System Architecture: Weak Evidence Finder]]
+
[[File:weak-evidence-finder.jpg|frame|center|alt=Textual Evidence Finder Details|System Architecture: Textual Evidence Finder]]
  
 
=== Query Generator ===  
 
=== Query Generator ===  
  
The query generator outputs two types of queries for each proposition:
+
Queries are generated from the tuple given for the proposition, for example
<ol>
+
: <code>(iron nail, is a good conductor of, electricity)</code>
<li>Keyword queries -- Extract keywords from the query sentence [TBD: Stemming? Stopword removal?]</li>
 
<li>Template queries -- A template query is simply a tuple (or the sentence) where one or more words in the tuple is replaced with a wild-card operator.</li>
 
</ol>  
 
  
The system will be given a set of rules that specify how to convert a tuple into different template queries. Start with two rules:
+
We'll do some preprocessing for stopword and function word removal, do something (TBD) about adjectives, and then execute different kinds of logical queries against indexed tuples that have been extracted from evidence source texts like the studyguide, glossary, clueweb(?), etc.  The logical query types are
<ol>
 
<li>Keyword queries -- Remove stopwords.</li>
 
<li>Template queries -- Take each tuple. For each field (arg1, rel and arg2), if it is a multi-word query replace each word with a wild-card.</li>
 
</ol>
 
  
; Examples
+
* Whole-tuple-match template queries match the full proposition tuple against indexed tuples (tuples extracted from source texts like the studyguide, glossary, clueweb, etc).  Results from these queries would be considered the strongest evidence.  For the given example:
<pre>
+
arg1:(iron AND nail) AND rel:(good AND conductor) AND arg2:(electricity)
 +
* Partial-tuple-match template queries match some of the terms in the proposition tuple against the same fields of indexed tuples.  For the given example:
 +
arg1:(iron OR nail) AND rel:(conductor) AND arg2(electricity)
 +
* Keyword-match queries match any keywords from the proposition tuple against any field(s) of the indexed tuples.  For the given example, where the <code>text</code> field is the concatenation of fields <code>arg1, rel,</code> and <code>arg2</code>:
 +
text:(iron OR nail OR conductor OR electricity)
  
Input:
+
We would also perform some (or all?) of the above queries with synonym expansion, and later (maybe) with hypernym expansion.
  
      Sentence: Iron nail is a good conductor of electricity
+
There still are open questions around:
        Tuples: (iron nail, is a good conductor of, electricity)
+
* Where and how to use normalized versions of terms (once we have both lemmatized and literal values indexed for <code>arg1, rel,</code> and <code>arg2</code>.
 +
* How to handle adjectives.
 +
* How to handle relation negation and polarity
 +
* What to do about headwords.
 +
* How exactly to score query results.  Initially we'll score results by rank from the query classes above along with term-overlap scores.
  
Output:
+
=== Solr/Lucence Layer ===
      Q1: (iron *, is a good conductor of, electricity) //Template query
 
      Q2: (*  nail, is a good conductor of, electricity)//Template query
 
      Q3: (iron nail, is a * conductor of, electricity) //Template query
 
      Q4: iron * conductor * electricity                //Template query
 
      Q5: iron or conductor or electricity              //Keyword query
 
  
</pre>
+
A general outline of the process to build the solr index for Vulcan.
  
=== Solr/Lucence Layer ===
+
[[File:Vulcan_extractions.jpg]]
  
 
To be filled in...
 
To be filled in...

Latest revision as of 22:22, 3 September 2013

I/O

Input: A Proposition [A natural language sentence + Open IE tuples from the sentence.]

Output: A list of query/score pairs representing evidence for the proposition.

Components

Textual Evidence Finder Details
System Architecture: Textual Evidence Finder

Query Generator

Queries are generated from the tuple given for the proposition, for example

(iron nail, is a good conductor of, electricity)

We'll do some preprocessing for stopword and function word removal, do something (TBD) about adjectives, and then execute different kinds of logical queries against indexed tuples that have been extracted from evidence source texts like the studyguide, glossary, clueweb(?), etc. The logical query types are

  • Whole-tuple-match template queries match the full proposition tuple against indexed tuples (tuples extracted from source texts like the studyguide, glossary, clueweb, etc). Results from these queries would be considered the strongest evidence. For the given example:
arg1:(iron AND nail) AND rel:(good AND conductor) AND arg2:(electricity)
  • Partial-tuple-match template queries match some of the terms in the proposition tuple against the same fields of indexed tuples. For the given example:
arg1:(iron OR nail) AND rel:(conductor) AND arg2(electricity)
  • Keyword-match queries match any keywords from the proposition tuple against any field(s) of the indexed tuples. For the given example, where the text field is the concatenation of fields arg1, rel, and arg2:
text:(iron OR nail OR conductor OR electricity)

We would also perform some (or all?) of the above queries with synonym expansion, and later (maybe) with hypernym expansion.

There still are open questions around:

  • Where and how to use normalized versions of terms (once we have both lemmatized and literal values indexed for arg1, rel, and arg2.
  • How to handle adjectives.
  • How to handle relation negation and polarity
  • What to do about headwords.
  • How exactly to score query results. Initially we'll score results by rank from the query classes above along with term-overlap scores.

Solr/Lucence Layer

A general outline of the process to build the solr index for Vulcan.

Vulcan extractions.jpg

To be filled in...


Index
  1. What corpora will be indexed? [Study guide, definitions and sentences covering glossary terms]
  2. What is the index structure? [Each tuple will be a Lucene document?]
  3. What is in a document? [Arg1, Arg1 Norm, Rel, Rel Norm, ...]
  4. ...
Search
  1. How to find tuples that match template queries? Use bag-of-words search and filter tuples that match filter?
  2. How can we use synonyms or other paraphrase resources?

Open IE 4.0

Use Open IE 4.0.