Difference between revisions of "IARPA/Pattern"
From Knowitall
Line 1: | Line 1: | ||
− | Expressions: | + | Regex Expressions: |
+ | |||
+ | * alternation: <code>|</code> | ||
+ | * option: <code>?</code> | ||
+ | * Kleene-star: <code>*</code> | ||
+ | * plus: <code>+</code> | ||
+ | * start assertion: <code>^</code> | ||
+ | * end assertion: <code>$</code> | ||
+ | * matching group: <code>()</code> | ||
+ | * non-matching group: <code>(?:)</code> | ||
+ | * named group: <code>(<name>:)</code> | ||
+ | |||
+ | Token Expressions: | ||
* string: takes a case-insensitive regular expression | * string: takes a case-insensitive regular expression |
Latest revision as of 22:04, 14 March 2011
Regex Expressions:
- alternation:
|
- option:
?
- Kleene-star:
*
- plus:
+
- start assertion:
^
- end assertion:
$
- matching group:
()
- non-matching group:
(?:)
- named group:
(<name>:)
Token Expressions:
- string: takes a case-insensitive regular expression
- stringcs: take a case-sensitive regular expression
- lemma: take a case-insensitive regular expression for the lemma
- pos: takes a case-insensitive regular expression for the pos tag
- chunk: takes a case-insensitive regular expression for the chunk tag
- type: takes a case-insensitive string for any type that spans the token
Examples:
<string="an?|the">? <pos="JJ">* <pos="NNP">+ <pos="NN">+ <pos="NNP>+ The incredible U.S. president Barack Obama famed UW professor Oren Etzioni
<pos="NNP">+ <stringcs="president">+ <pos="NNP>+ U.S. president Barack Obama not: U.S. President Barack Obama