NewMultir Aggregate Comparison

From Knowitall
Revision as of 20:14, 12 November 2013 by Jgilme1 (talk | contribs) (Created page with ";Results Comparison After implementing the new distant supervision / feature generation interface and generating the data needed for the Multir algorithm. I ran the aggregate sc...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Results Comparison

After implementing the new distant supervision / feature generation interface and generating the data needed for the Multir algorithm. I ran the aggregate scoring algorithm to validate the current performance by comparing it with the old performance. The relations that the model was trained and tested on remained the same, but there were differences in the corpora, feature generation, knowledge base representation. It is hard for be to determine why the results are so different, but it likely has to do with the distribution of relation instances across the two tests.


Aggregate Scorer

The aggregate scorer is different from the sentential scorer. The aggregate scorer takes all of the test data and runs the extraction inference algorithm on each instance and then checks the labels on the instances of the test data to see if any of the labels match the extraction output at test time. This is more forgiving than the sentential scorer since a pair of entities may have several relations. On the other hand a test-time extraction can only be marked correct if it was labelled as such in the test data, which means it has to be part of the knowledge base.


Aggregate Extraction Precision/Recall Table at Highest Recall Level
Algorithm Precision Recall
Mihai's Reimplementation .328 .183
Original Multir .372 .180
New Multir .494 .324


Original Multir Riedel-Training Data Profile
Relation Count Percentage
NA 91373 76%
/location/location/contains 13164 10.9%
/location/neighborhood/neighborhood_of 2966 2.5%
Total 120290 100%


Original Multir Riedel-Test Data Profile
Relation Count Percentage
NA 166004 97%
/location/location/contains 2085 1.2%
/location/neighborhood/neighborhood_of 68 .03%
Total 171360 100%