Difference between revisions of "Vulcan/MeetingNotes/Aug16 2013"

From Knowitall
Jump to: navigation, search
(Update)
(Notes)
 
(28 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
== Notes ==
 +
 +
* Greg will own evidence finder.
 +
<blockquote>
 +
* Walk through the system architecture with Greg.
 +
</blockquote>
 +
 +
* Axiomatic representations can be limiting. Figure out how to allow for entailment type matching within Tuffy.
 +
<blockquote>
 +
* Store sentences with axioms.
 +
* Figure out how to do procedural escapes in Tuffy.
 +
* Mid next week figure out an estimate for when we will have a system that works on one or five examples.
 +
</blockquote>
 +
 +
* Get a knowledge spec:
 +
<blockquote>
 +
* isa, partOf, etc.
 +
* Don't reimplement. Find resources.
 +
</blockquote>
 +
 +
* Stephen to take lead on definition extractor.
 +
<blockquote>
 +
* Send Stephen literature and other material for definition processing.
 +
 +
</blockquote>
 +
* What are the research problems?
 +
<blockquote>
 +
* Definitions extractor
 +
* Reading rules from text.
 +
* Abductive reasoning
 +
* Procedural escapes for textual matching.
 +
</blockquote>
 +
 +
== Agenda ==
 +
* Update
 +
* System architecture
 +
* Plan for Greg
 +
<blockquote>
 +
* Processing text collections (definitions, study guide etc.) using Open IE and import into Solr.
 +
* Converting WordNet and CNC to Tuffy axiom format and import into Postgres.
 +
* Convert scored assertions into a format that is acceptable to Vulcan's evaluation framework.
 +
Long term plan: Greg will be responsible for inference (online) components and 
 +
Niranjan will focus on the offline components (generating axioms and rules) and experimentation.<br/>
 +
 +
</blockquote>
 +
* Experiment/Evaluation plan
 +
 
== Update ==
 
== Update ==
 
+
; System development ([[SystemStatus | Details on architecture and status]])
; System development (See detailed [[SystemStatus | architecture and status]])
 
 
: 1. Online inference components implemented.
 
: 1. Online inference components implemented.
 
<blockquote>
 
<blockquote>
Line 9: Line 55:
 
</blockquote>
 
</blockquote>
 
: 2. Offline components -- axioms and rule generation -- NOT implemented.
 
: 2. Offline components -- axioms and rule generation -- NOT implemented.
 +
 +
: 3. Planning to use Tuffy MLN Inference system directly.
 +
<blockquote>
 +
<b>Why Tuffy and not Jena or another inference engine? Why not Alchemy?</b>
 +
* Inference engines such as Jena/OWLim don't directly support multiple inference paths. Community's response is to suggest Datalog/prolog implementations.
 +
* Tuffy supports MLN capabilities in Alchemy but is orders of magnitude faster (what takes 6 hours in Alchemy takes 2 minutes in Tuffy).
 +
</blockquote>
  
 
; Experiments and Evaluation
 
; Experiments and Evaluation
: 1. Framework: Vulcan has a good evaluation interface setup. We will use this for starters.<br/>
 
: 2. Data: Training/Test splits set up by Vulcan. <br/>
 
# Training questions = <b>474</b><br/>
 
# Test questions = <b>290</b><br/>
 
  
The questions cover 4-12th and AP exams. Training data distribution.
+
Not ready to do evaluation yet but here are some useful details.
  
 +
: 1. Framework: Vulcan has a good evaluation interface setup. We will use this for starters. <b>([http://homes.cs.washington.edu/~niranjan/vulcan/example-results.html Example output from the evaluation framework.])</b><br/>
 +
: 2. Data: Training/Test splits set up by Vulcan. The questions cover 4-12th and AP exams. <br/>
 
<blockquote>
 
<blockquote>
 +
Training = <b>474</b> questions.<br/>
 +
Test = <b>290</b> questions.<br/>
 +
 +
Training data distribution and Vulcan's current performance:
 +
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
!Grade !! All Questions !! #Mult.Choice and<br/> Non-diag.
+
!Grade !! All Questions !! #Mult.Choice and<br/> Non-diag. (MC-ND) !! Vulcan Performance<br/> on MC-ND 
 
|-
 
|-
|4th grade || 249 || 108
+
|4th grade || 249 || 108 || 55.09%
 
|-
 
|-
|8th grade || 476 || 125
+
|8th grade || 476 || 125 || 55.07%
 
|-
 
|-
| 12th grade || 446 || 160  
+
| 12th grade || 446 || 160 || 25.83%
 
|-
 
|-
| AP || 116 || 81
+
| AP || 116 || 81 || 45.68%
 
|-  
 
|-  
 
| All || 1287 || 474
 
| All || 1287 || 474
Line 53: Line 109:
 
</blockquote>
 
</blockquote>
 
: 3. Method: Input sentences that correspond to each assertion. Score assertions using our system and submit to Vulcan's web interface.<br/>
 
: 3. Method: Input sentences that correspond to each assertion. Score assertions using our system and submit to Vulcan's web interface.<br/>
 
; Design questions.
 
: 1. Why not use MLN directly? Why use a backward chained inferencer (such as Jena) as an intermediate step?
 
<blockquote>
 
* Looks like a separte backward-chained inferencer won't be necessary.<br/>
 
* Tuffy, an MLN implementation, does KBMC to scale MLN inference. Details [http://hazy.cs.wisc.edu/hazy/papers/tuffy-vldb2011-slides.pdf|here]<br/>
 
</blockquote>
 
: 2.
 
 
; Analysis
 
 
: 1. Selected 10 propositions that are single Open IE tuples as starting targets.
 
 
: 2. Wrote down [http://homes.cs.washington.edu/~niranjan/vulcan/aug09/stepsinvolved.docx steps involved] in verifying these propositions.
 
 
== Agenda ==
 
 
== To Do (Copied over from previous week) ==
 
 
; System building
 
 
: 1. Implement "template matching" using the ClueWeb corpus.<b>Pending</b>
 
<blockquote>
 
* URL for Open IE backend is available.
 
* For an assertion A, find sentences that have high overlap. Generate regex patterns for the proposition. Score sentences by how well they match the regex patterns.
 
</blockquote>
 
 
: 2. Continue system building.
 
<blockquote>
 
* Create a derivation scorer stub. This will be replaced with a MLN or a BLP scorer. <b>Done.</b>
 
* Test with iron nail example.
 
</blockquote>
 
 
: 3. Jena API doesn't readily support multiple derivations.
 
<blockquote>
 
* Ask Jena community to find out if this is possible. <b>Done. Not possible.</b>
 
* OWLIM as replacement. <b>Done. Doesn't look promising. No response from community.</b>
 
</blockquote>
 
 
:4. Try out [http://hazy.cs.wisc.edu/hazy/tuffy/ Tuffy MLN] implemenatation. <b>Done.</b>
 
<blockquote>
 
* Use output of iron nail example
 
* If easy to use write wrappers around Tuffy to hook into our system.
 
</blockquote>
 
 
:5 Write evaluation code. <b>Vulcan has a good interface set up.</b>
 
<blockquote>
 
* Check with Peter.
 
</blockquote>
 
 
:6. Create a [[Vulcan/SystemArchitecture| system architecture]] page with a figure and overview of the main components.
 
<b>Created a [[Vulcan/SystemStatus| System status page]] instead.</b>
 
<blockquote>
 
* Created a figure. Added it to system design document.
 
* Need to create a wiki page for system architecture and overview.
 
</blockquote>
 
 
; Experiments <b>Pending</b>
 
 
: 1. Run template matching approach as a baseline.
 
 
: 2. Run inference system as a baseline.
 

Latest revision as of 21:44, 16 August 2013

Notes

  • Greg will own evidence finder.
  • Walk through the system architecture with Greg.
  • Axiomatic representations can be limiting. Figure out how to allow for entailment type matching within Tuffy.
  • Store sentences with axioms.
  • Figure out how to do procedural escapes in Tuffy.
  • Mid next week figure out an estimate for when we will have a system that works on one or five examples.
  • Get a knowledge spec:
  • isa, partOf, etc.
  • Don't reimplement. Find resources.
  • Stephen to take lead on definition extractor.
  • Send Stephen literature and other material for definition processing.
  • What are the research problems?
  • Definitions extractor
  • Reading rules from text.
  • Abductive reasoning
  • Procedural escapes for textual matching.

Agenda

  • Update
  • System architecture
  • Plan for Greg
  • Processing text collections (definitions, study guide etc.) using Open IE and import into Solr.
  • Converting WordNet and CNC to Tuffy axiom format and import into Postgres.
  • Convert scored assertions into a format that is acceptable to Vulcan's evaluation framework.

Long term plan: Greg will be responsible for inference (online) components and Niranjan will focus on the offline components (generating axioms and rules) and experimentation.

  • Experiment/Evaluation plan

Update

System development ( Details on architecture and status)
1. Online inference components implemented.
  • Proposition generator -- Extract tuples from input sentence and convert into a proposition.
  • Evidence finder -- Tuple matching over Open IE Clueweb data.
  • MLN Inference -- A wrapper around Tuffy's MLN inferencer.
2. Offline components -- axioms and rule generation -- NOT implemented.
3. Planning to use Tuffy MLN Inference system directly.

Why Tuffy and not Jena or another inference engine? Why not Alchemy?

  • Inference engines such as Jena/OWLim don't directly support multiple inference paths. Community's response is to suggest Datalog/prolog implementations.
  • Tuffy supports MLN capabilities in Alchemy but is orders of magnitude faster (what takes 6 hours in Alchemy takes 2 minutes in Tuffy).
Experiments and Evaluation

Not ready to do evaluation yet but here are some useful details.

1. Framework: Vulcan has a good evaluation interface setup. We will use this for starters. (Example output from the evaluation framework.)
2. Data: Training/Test splits set up by Vulcan. The questions cover 4-12th and AP exams.

Training = 474 questions.
Test = 290 questions.

Training data distribution and Vulcan's current performance:

Grade All Questions #Mult.Choice and
Non-diag. (MC-ND)
Vulcan Performance
on MC-ND
4th grade 249 108 55.09%
8th grade 476 125 55.07%
12th grade 446 160 25.83%
AP 116 81 45.68%
All 1287 474
3. Method: Input sentences that correspond to each assertion. Score assertions using our system and submit to Vulcan's web interface.