Vulcan/TextualEvidence/Plan

From Knowitall
Revision as of 17:33, 6 September 2013 by Gregj (talk | contribs) (9/5/2013)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

9/5/2013

The plan from here is:

  1. Get a simple system running which
    1. will take an input proposition triple (arg1, rel, arg2) and return a ranked list of tuples from the texts already indexed. Ranking will be still be crude (either lucene or extraction confidence scores). See Scoring (TODO not wiki yet) for the plan on how to score eventually. (done 9/6)
    2. will take a keyword query and return indexed tuples matching the keyword(s) queried. This is mostly as a tool for exploring the indexed data. (should be done by end of day 9/6)
    3. has some minimal web-frontend to 1. and 2. above for human exploring (est. 1 day, aiming for Tues 9/10)
  2. Get some examples of queries generated from proposition tuples for the kinds of query templates we expect to use to try to find evidence, along with some results for some example queries. See QueryGeneration Example Results for the kinds of queries that we'll generate and how, as well as results for a set of example queries. (done for first pass 9/6, but expect to keep updating this as the system matures)
  3. Define the Tuple representation the system will use in code. This'll be commonly used, so we want to settle on a definition early for people to be able to build with. (est. 1 day, aiming for Thurs 9/12)
  4. Add support for working with lexical variants of tuple terms where appropriate in the indexed tuple store and in the runtime query generation, execution, and scoring layer. This includes
    1. lemmatization - this is primarily an indexing change, then at query time we'll probably query lemmas. If we want to score differently for perfect vs lemma matches, we'll probably do that when scoring. (est. .5 days to update indexing process; we'll have to reindex everything too, but probably not do that until other indexing changes are fleshed out, like head words)
    2. head word extraction - extract and index head words when indexing. Not sure yet what happens with query generation and scoring. TBD.
    3. polarity extraction and negation handling - probably all done when scoring (Nirajan has code to check polarity in the results, and negations are already indexed). TBD.
    4. passive handling - Niranjan suggested flipping arg1 and arg2 if openie identifies the relation as passive. This becomes a change in query generation (bigger queries). TBD.
    5. synonym expansion - plan is to use solr's built in synonym query expansion. TBD.
  5. Process the new definitions dictionary. Have to figure out how to process the definitions given their structures. TBD