Difference between revisions of "Bobs Multir Updates"

From Knowitall
Jump to: navigation, search
Line 4: Line 4:
 
*# Did preliminary planning, code exploration, etc, to begin implementation of MultiR output module for populating Information Omnivore Database.
 
*# Did preliminary planning, code exploration, etc, to begin implementation of MultiR output module for populating Information Omnivore Database.
 
*# Preliminary exploratory implementation included doing a build of a trial MultiR system with minimal modifications (to see if build process is working properly).
 
*# Preliminary exploratory implementation included doing a build of a trial MultiR system with minimal modifications (to see if build process is working properly).
*# After fixing minor bug (elided import), trial build worked.  Ran system on repeat of the baseline/vs/generalized-nonpartitioned/vs/partitioned experiment to test functionality; it worked OK.  Ran only on Test corpus subset, to save time.
+
*# After fixing minor bug (missing import), trial build worked.  Ran system on repeat of the baseline/vs/generalized-nonpartitioned/vs/partitioned experiment to test functionality; it worked OK.  Ran only on Test corpus subset, to save time.
 
* Next Week
 
* Next Week
 
*# Will build new version of MultiR with two new classes, initially copies of ManualEvaluation and MultiModelManualEvaluation, as prototypes for new functionality needed for writing output files to populate Info Omnivore Database.
 
*# Will build new version of MultiR with two new classes, initially copies of ManualEvaluation and MultiModelManualEvaluation, as prototypes for new functionality needed for writing output files to populate Info Omnivore Database.

Revision as of 23:37, 21 April 2014

April 20 2014 Update

Bob's Update

  • This Week
    1. Did preliminary planning, code exploration, etc, to begin implementation of MultiR output module for populating Information Omnivore Database.
    2. Preliminary exploratory implementation included doing a build of a trial MultiR system with minimal modifications (to see if build process is working properly).
    3. After fixing minor bug (missing import), trial build worked. Ran system on repeat of the baseline/vs/generalized-nonpartitioned/vs/partitioned experiment to test functionality; it worked OK. Ran only on Test corpus subset, to save time.
  • Next Week
    1. Will build new version of MultiR with two new classes, initially copies of ManualEvaluation and MultiModelManualEvaluation, as prototypes for new functionality needed for writing output files to populate Info Omnivore Database.
    2. The new classes will perform essentially same function (and have access to same data) as the ManualEvaluation and MultiModelManualEvaluation classes but will write three new files (rather than dumping to Standard Output):
      1. Extractions file (data to populate most Info Omnivore DB tables, containing information on extractions, sentences, and document-relative offsets (these offsets must be computed from information currently held in some MultiR data structures and some WebWare6 files but not currently printed out by MultiR).
      2. VoteRelation file containing information on extractions with associated experiment names and dates, voter IDs, etc.
      3. Sentences file containing a list of all sentences (full original text strings) contained within the 300 documents in the Test subset of the full corpus, with associated DocIDs (of the doc containing the sentence) and offsets (document-relative start and end for each sentence).
    3. Will run this new output module on two cases: the baseline system, and the partitioned-model system with generalized features. The associated 6 output files will serve as inputs for Omnivore DB population.
    4. Next step (hopefully started this week) will be actual population of Omnivore DB from the six just-described files.

April 13 2014 Update

Bob's Update

  • This Week
    1. Studied MultiR, GIT, etc.
    2. Ran complete Baseline(Unpartitioned-Model) versus Baseline-Features(Partitioned-Model) versus Generalized-Features with-and-without model partitioning. All look reasonable except baseline-unpartitioned, which may have been compromised by a bad distance-supervision output file. John replaced the file (earlier runs with different file and later with fixed file looked OK, so it may have been that I used a temporarily-bad file). The other three look OK - in general, generalizing the features seems to help a little but partitioning the model helps a lot. Results are in three subdirectories of "/projects/WebWare6/Multir/Evaluations":
      1. Baseline_v_Generalized_and_Partitioned_v_Non/
      2. GeneralizedFeatures_NonPartitioned/
      3. GeneralizedFeatures_Partitioned/
    3. Prepared for Database design for Information Omnivore, including a meeting with Stephen and Lydia Chilton on Friday.
  • Next Week
    1. Main activity will focus on implementation of the Information Omnivore Database.
    2. May rerun some MultiR experiment if needed to resolve ambiguities mentioned above.

April 4 2014 Update

Bob's Update

  • This Week
    1. Continued familiarization with MultiR codebase (and GIT, and Eclipse, etc).
    2. Ran part of 2-by-2 experiment: the Baseline+Generalized Features, No-Partitioning case.
  • Next Week
    1. Will finish (if needed) and write up results of 4-way (2-by-2, Generalized-added/Baseline-only versus Partitioned/No-Partitioned) experiment.
    2. Highest priority: Starting discussions with users and preliminary design of Information Omnivore Database to hold results/data/etc for all experiments in the NLP/Crowdsourcing groups. This will extend some preliminary design work by Stephen and will start with consultations with potential users (probably Lydia Chilton first). Will lead to implementation over next month or so.
    3. Lower priority (ie, back-burner): Design of tracing mechanism for MultiR project which (hopefully) will elucidate role of features in training process, so we can track cause of apparent overgeneralizations. This hopefully will lead to a general tool that can be used to track internal behavior of Multir and other learning systems.

March 21 2014 Update

  • Bob's update
    1. See document [1]