Difference between revisions of "Vulcan/XWordNet"

From Knowitall
Jump to: navigation, search
(Logical Form Transformation)
(Logical Form Transformation)
Line 19: Line 19:
  
 
== Logical Form Transformation ==
 
== Logical Form Transformation ==
 
 
 
The LFT is a logical form that closely mirrors the information in the syntactic parse.  The LFT has a predicate for each noun, verb, adjective, adverb, preposition, and conjunction with arguments represented as  x<n> or e<n> (for events).  Each verb has arguments e<n> that serves as a pointer to the event, followed by arguments for the subject, object, and optional indirect object.  There is always an x<n> for the subject or object, although they may point nowhere if the subject or object is not specified in the definition.  The intransitive verb exist:VB(e1, x1, x26) has such a dummy X26.
 
The LFT is a logical form that closely mirrors the information in the syntactic parse.  The LFT has a predicate for each noun, verb, adjective, adverb, preposition, and conjunction with arguments represented as  x<n> or e<n> (for events).  Each verb has arguments e<n> that serves as a pointer to the event, followed by arguments for the subject, object, and optional indirect object.  There is always an x<n> for the subject or object, although they may point nowhere if the subject or object is not specified in the definition.  The intransitive verb exist:VB(e1, x1, x26) has such a dummy X26.
 
  
 
If a verb has an attached PP, this is handled in the predicate for the preposition.  For example “unit of all organisms” is represented as unit:NN(x1) of:IN(x1, x2) all:JJ(x2) organism:NN(x2).
 
If a verb has an attached PP, this is handled in the predicate for the preposition.  For example “unit of all organisms” is represented as unit:NN(x1) of:IN(x1, x2) all:JJ(x2) organism:NN(x2).
 
  
 
The LFT uses stemmed lexemes, ignores tense and modals, ignores parentheticals, and omits usage examples.
 
The LFT uses stemmed lexemes, ignores tense and modals, ignores parentheticals, and omits usage examples.
 
 
  
 
Example entry for definition of “cell”.   
 
Example entry for definition of “cell”.   

Revision as of 21:35, 9 September 2013

by Stephen (Sept 09, 2013).

Overview

XWordNet is a project from UTexas that uses a POS tagger and parser tuned to WordNet glosses and edits the uncertain parses by hand. It has four files

  • adj.xml
  • adv.xml
  • noun.xml
  • verb.xml

For each entry, there is the synsetID, synonyms, text of the gloss, and three analyses

  • POS tags for each word in the gloss, along with their WN senses
  • syntactic parse of the gloss
  • the LFT (Logical Form Transformation) of the definition from the gloss


There may be an advantage to not using the LFT, but starting directly from the POS tags or the syntactic parse. This avoids having a separate set of rules for WordNet definitions.


The xml files are in WebWare6\niranjan\vulcan\data\definitions\XWordNet\XWN2.0-1.1. The file for noun is too large for Windows xml reader to open, so I have the first 1000 lines as noun_1000.txt.

Logical Form Transformation

The LFT is a logical form that closely mirrors the information in the syntactic parse. The LFT has a predicate for each noun, verb, adjective, adverb, preposition, and conjunction with arguments represented as x<n> or e<n> (for events). Each verb has arguments e<n> that serves as a pointer to the event, followed by arguments for the subject, object, and optional indirect object. There is always an x<n> for the subject or object, although they may point nowhere if the subject or object is not specified in the definition. The intransitive verb exist:VB(e1, x1, x26) has such a dummy X26.

If a verb has an attached PP, this is handled in the predicate for the preposition. For example “unit of all organisms” is represented as unit:NN(x1) of:IN(x1, x2) all:JJ(x2) organism:NN(x2).

The LFT uses stemmed lexemes, ignores tense and modals, ignores parentheticals, and omits usage examples.

Example entry for definition of “cell”.

Note that this gloss is treated as two separate definitions, prepending “Cell is …” to each sentence.   
“Cell is the basic structural and functional unit of all organisms” and 
“Cell is cells may exist as independent units of life or may form colonies or tissues as in higher plants and animals”

<gloss pos="NOUN" synsetID="00004824">
  <synonymSet>cell</synonymSet>
 <text>
   (biology) the basic structural and functional unit of all organisms; 
   cells may exist as independent units of life (as in monads) or may form colonies or tissues as in higher plants and animals  
 </text>
  <wsd>
      <punc>(</punc>
      <wf pos="NN" lemma="biology" quality="normal" wnsn="1" >biology</wf>
      <punc>)</punc>
      <wf pos="DT" >the</wf>
      <wf pos="JJ" lemma="basic" quality="gold" wnsn="1" >basic</wf>
            …
      <wf pos="NNS" lemma="animal" quality="silver" wnsn="1" >animals</wf>
  </wsd>
<parse quality="NORMAL">
(TOP (S (NP (NN cell) ) 
        (VP (VP (VBZ is) 
                (NP (NNS cells) ) ) 
            (VP (MD may) 
                (VP (VP (VB exist) 
                        (PP (IN as) 
                            (NP (NP (JJ independent) (NNS units) ) 
                                (PP (IN of) 
                                    (NP (NN life) ) ) ) ) ) 
                    (CC or) 
                    (VP (MD may) 
                        (VP (VB form) 
                            (NP (NP (NNS colonies) (CC or) (NNS tissues) ) 
                                (PP (IN as) 
                                    (PP (IN in) 
                                        (NP (JJR higher) (NNS plants) (CC and) (NNS animals) ) ) ) ) ) ) ) ) ) 
        (. .) ) ) 
</parse>
<parse quality="SILVER">
(TOP (S (NP (NN cell) ) 
        (VP (VBZ is) 
            (NP (NP (DT the) (JJ basic) 
                    (ADJP (JJ structural) (CC and) (JJ functional) ) 
                    (NN unit) ) 
                (PP (IN of) 
                    (NP (DT all) (NNS organisms) ) ) ) ) 
        (. .) ) ) 
</parse>
 <lft quality="SILVER">
  cell:NN(x1) -> basic:JJ(x1) structural:JJ(x1) functional:JJ(x1) unit:NN(x1) of:IN(x1, x2) all:JJ(x2) organism:NN(x2)
 </lft>
 <lft quality="NORMAL">
  cell:NN(x1) -> cell:NN(x1) exist:VB(e1, x1, x26) as:IN(e1, x2) independent:JJ(x2) unit:NN(x2) of:IN(x2, x3) life:NN(x3) 
                 or:CC(e3, e1, e2) form:VB(e2, x1, x8) colony:NN(x4) or:CC(x8, x4, x5) tissue:NN(x5) 
                 as:IN(x8, x9) in:IN(x8, x9) higher:JJ(x9) plant:NN(x6) and:CC(x9, x6, x7) animal:NN(x7)
 </lft>
</gloss>