Difference between revisions of "Vulcan/SystemPrototype"

From Knowitall
Jump to: navigation, search
(Status)
(Status)
Line 26: Line 26:
 
<blockquote>
 
<blockquote>
 
* Tuffy gets it right for <strike>2 out of</strike> all 3 questions. i.e., it assigns higher probabilities for the correct proposition.
 
* Tuffy gets it right for <strike>2 out of</strike> all 3 questions. i.e., it assigns higher probabilities for the correct proposition.
* Facts inferred by larger number of steps have a lower score compared to those inferred by a smaller number of steps.
+
* For the most part, facts inferred by larger number of steps have a lower score compared to those inferred by a smaller number of steps. <b>This depends on the weights on the inference rules.</b>
</blockquote>
 
 
 
* <b>Why does it fail on the one question?</b>
 
<blockquote>
 
* The system fails on the iron nail example.
 
* Both "iron nail" and "plastic cup" get same high probability score (0.94). <b>Assigning smaller weights on the tuple generation rules fixes this problem. Previously, these rules had a infinite weight.</b>.
 
 
*<b><span style="background-color:yellow">I need to better understand how the MLN scoring works in general and how Tuffy implements the inference.</span></b>
 
*<b><span style="background-color:yellow">I need to better understand how the MLN scoring works in general and how Tuffy implements the inference.</span></b>
 +
* <b>Tuffy's output probabilities are non-deterministic</b> making it harder to debug. Need a fix for this.
 
</blockquote>
 
</blockquote>
 
* <b>Tuffy's output probabilities are non-deterministic</b> making it harder to debug. Need a fix for this.
 
 
 
* <b> What diagnostics do we NOT have?</b>
 
* <b> What diagnostics do we NOT have?</b>
 
<blockquote>
 
<blockquote>

Revision as of 19:43, 5 September 2013

Overview

The prototype is designed to work on three questions. We want the system to output the following:

  • Score for the input proposition.
  • New facts inferred.
  • Facts and rules used in scoring.

Status

  • Ran Tuffy on three example questions. It failed on one question.
  • Hand generated the input evidence for the propositions (one correct and one incorrect) for three questions.
  • Hand generated the MLN rules based on Stephen's human-readable rules.The MLN rules can be found here.
  • Ran Tuffy to obtain the inference probabilities on the propositions.
  • The system also outputs:
  • All inferred facts along with their probabilities.
  • All rules that are reachable from the query fact. i.e., Clauses in the MLN that are relevant to the inference of the query fact.
  • Does it work?
  • Tuffy gets it right for 2 out of all 3 questions. i.e., it assigns higher probabilities for the correct proposition.
  • For the most part, facts inferred by larger number of steps have a lower score compared to those inferred by a smaller number of steps. This depends on the weights on the inference rules.
  • I need to better understand how the MLN scoring works in general and how Tuffy implements the inference.
  • Tuffy's output probabilities are non-deterministic making it harder to debug. Need a fix for this.
  • What diagnostics do we NOT have?
  • Connections between the clauses in the MLN.
  • A reconstruction/visualization of the MLN network. Working with Tuffy developers on this.
  • What next?
  • Figure out why iron nail fails.
  • Use automatically found evidence i.e., actual open ie tuples instead of hand-generated evidence.
  • Figure out what mismatches exist between required and available knowledge.
  • Improve diagnostics output.
  • What does this exercise suggest?
  • Need to figure out how the weights on the MLN rules and evidence are use. [I assigned them arbitrarily for this round.]
  • Use predicates with small arity. For example, avoid writing rules entire nested tuples as predicates.
  • The only reason we'd need a nested tuple is for the purpose of computing the score. For now we can compute this from the score of its components: Score(nested_tuple) = Score(top tuple) * Score (nested).