Further Work

Bootrapping

Patterns

reduce postag restriction (collapse VB, VBZ, VBN, etc.)

Building the boostrapping data

Determining target relations

Restrict high quality set of ClueWeb extractions to have proper noun arguments
Choose the most frequent relations from this set

Determining target extractions (seeds)

Start with the clean, chunked dataset of ReVerb extractions from ClueWeb.
Apply Jonathan Berant's relation string normalization.
Filter relations so each relation's normalized relation string matches a target relation and the arguments only contain DT, IN, NNP, and NNPS.
Filter extractions
1. Remove extractions that occur a single time.
2. Remove extractions with single or double letter arguments, optionally ending with a period.
Filter arguments
1. Remove inc, ltd, vehicle, turn, page, site
2. Remove arguments that are 2 or fewer characters
Measure the occurrence of the arguments.
Keep extractions from the target relations that have arguments that occur commonly (20 times).
Remove target relations which have fewer than 15 seeds

Lemma grep

Search corpus for all sentences that contain the lemmas in a target extraction.
Remove duplicate sentences (sentence*extraction pairs must be unique).
For each sentence*extraction pair, search for a pattern that connects the lemmas.
1. Pattern must start with the arg1

Reducing the patterned results

Don't allow patterns that contain punct edges or edges with non-word ([^\w]) characters
Remove patterns that occur less than 10 times.
Remove extractions that have an (extraction, pattern) pairs that occurs anomalously frequently.
1. There was a single one: (hotel reservation, be make, online) ocurred 32k times, the next one ocurred 8k times

Extracting

Collapsing Dependencies

Collapsing noun edges

We want only the nn edge between "Barack Obama" to be collapsed in the following.

US president Barack Obama declared victory yesterday.
US president Barack Obama likes to drink beer.
Barack Obama, the president of the US, has a wife.

Executing the Extractor

Generalized Extractor

Apply pattern to sentence
Remove matches with an adjacent `neg` edge
Convert the match into an extraction

Specific Extractor

Run the generalized extractor with the pattern from a (pattern, relation) pair
Keep any extractions where the relations match

LDA Extractor

Run the generalized extractor with a pattern
Remove extractions with "relation strings" that don't match any target relation
Keep best associated target relation by maximizing P(p | r)

Extractions

Slots

prepc

After winning the Superbowl, the Saints are top dogs of the NFL.
He purchased it without paying a premium.
After winning the lottery, James becomes an Epicurean.
Two months after joining the European Union , Bulgaria began attracting increasing interest towards local real estates.

partmod/advcl

Having won the lottery, James becomes an Epicurean.

Pattern Learning

Contents

Further Work

Bootrapping

Patterns

Building the boostrapping data

Determining target relations

Determining target extractions (seeds)

Lemma grep

Reducing the patterned results

Extracting

Collapsing Dependencies

Collapsing noun edges

Executing the Extractor

Generalized Extractor

Specific Extractor

LDA Extractor

Extractions

Slots

prepc

partmod/advcl

Navigation menu

Views

Personal tools

Navigation

Search

Tools