OpenCog

RelXML

From OpenCog

RelEx can produce various types of output, including two different types of XML output (the other being OpenCogXML output). This page describes one of these output formats.

Contents

RelExML output

The goal of RelExML is to output the RelEx relationship graph in the form of a structured XML document.

This page documents the current structure of the Relex XML output, focusing in particular on the relationship between the Relex graph structure, and the XML structure. This page is current and accurate as of 4 February 2008.

Index of Words in Sentence

All markup will refer to the index of the words of the sentence. For example:

 1 = The
 2 = commie
 3 = was
 4 = red
 5 = .

The index will always be an integer. The index will usually, but not always, be consecutive. Non-consecutive indexes will typically show up when two or more words are melded into an entity: _e.g._ "New York" will become one word, and the index will skip a beat.

The corresponding XML output is:

<relex_xml>
  <words>
     <word id=1>The</word>
     <word id=2>commie</word>
     <word id=3>was</word>
     <word id=4>red</word>
     <word id=5>.</word>
  </words>
</relex_xml>

Phrase Structure

The Link Grammar parser can generate a Penn tree-bank style phrase markup. For example:

(S (NP The commie) (VP was (ADJP red)) .)

The XML format of this representation is

<relex_xml>
  <phrases>
     <phrase type="noun" id="1001" head="2" parts="1 2">The commie</phrase>
     <phrase type="adjective" id="1003" head="4" parts="4">red</phrase>
     <phrase type="verb" id="1002" head="4" parts="3 1003">was red</phrase>
     <phrase type="root" id="1000" parts="1001 1002 5">The commie was red .</phrase>
  </phrases>
</relex_xml>

The XML format provides some additional information which the Penn tree-bank style markup does not: it indicates which word should be considered to the "head" of the phrase. All non-head words in a phrase are modifiers to the head word. Thus, for example, "commie" is the head of "The commie", with the word "the" being merely a modifier to "commie". The head word is indicated with the "head" XML attribute.

There are a number of different possible phrase types:

XML type Penn type Example
adjective ADJP
adverb ADVP
clause S (S (NP (NP The commie) , (SBAR (WHNP who) (S (VP was (ADJP red)))) ,) (VP faded (PRT away)) .)
inverted SINV
noun NP
particle PRT (S (NP (NP The commie) , (SBAR (WHNP who) (S (VP was (ADJP red)))) ,) (VP faded (PRT away)) .)
preposition PP (S (NP There) (VP were (NP (NP three) (PP of (NP them)))) .)
quantifier QP
root S
subordinate SBAR (S (NP (NP The commie) , (SBAR (WHNP who) (S (VP was (ADJP red)))) ,) (VP faded (PRT away)) .)
verb VP
wh-adverb WHADVP (S (NP The bridge) (VP was (ADJP unfinished)) (SBAR (WHADVP when) (S (NP it) (VP collapsed))) .)
wh-noun WHNP (S (NP (NP The commie) , (SBAR (WHNP who) (S (VP was (ADJP red)))) ,) (VP faded (PRT away)) .)
wh-prep WHPP (S (NP The commie (SBAR (WHPP to (WHNP whom)) (S (NP I) (VP was))) speaking) (VP was (ADJP red)) .)

Word Properties

Words may have a number of different properties: their number, tense, or part of speech, for example. Thus, for example, one has the plain-english markup:

PartOfSpeech (red[4], adjective)
PartOfSpeech (was[3], verb)
Number (commie[2], singular)

Becomes (edited for briefness):

<relex_xml>
  <relex_words>
     <word id="2001">adjective</word>
     <word id="2011">verb</word>
     <word id="2009">singular</word>
  </relex_words>
  <relations>
     <relation id="3003" label="pos"><id1>4</id1><id2>2001</id2></relation>
     <relation id="3012" label="pos"><id1>3</id1><id2>2011</id2></relation> 
     <relation id="3009" label="number"><id1>2</id1><id2>2009</id2></relation>
  </relations>
</relex_xml>

See the article word properties for an authoritative list.

Binary Relations

Binary relations connect pairs of words or phrases, and name the relationship between these parts. Thus, for example, int the sentence "John threw the ball", "John" is the subject who is doing the throwing, and "the ball" is the object being thrown. This is denoted as

  Subject (threw[2], John[1])
  Object (threw[2], the ball[1003])

The XML markup for this follows in the same pattern as the unary relations; thus (edited for briefness):

<relex_xml>
  <words>
     <word id="1">John</word>
     <word id="2">threw</word>
     <word id="3">the</word>
     <word id="4">ball</word>
     <word id="5">.</word>
  </words>
  <relex_words>
     <word id="2003">past</word>
     <word id="2001">throw</word>
  </relex_words>
  <phrases>
     <phrase type="noun" id="1001" head="1" parts="1">John</phrase>
     <phrase type="noun" id="1003" head="4" parts="3 4">the ball</phrase>
     <phrase type="verb" id="1002" head="2" parts="2 1003">threw the ball</phrase>
     <phrase type="root" id="1000" parts="1001 1002 5">John threw the ball .</phrase>
  </phrases>
  <relations>
     <relation id="3001" label="infinitive"><id1>2</id1><id2>2001</id2></relation>
     <relation id="3002" label="subject"><id1>2</id1><id2>1</id2></relation>
     <relation id="3003" label="object"><id1>2</id1><id2>1003</id2></relation>
     <relation id="3004" label="tense"><id1>2</id1><id2>2003</id2></relation>
  </relations>
</relex_xml>

The full set of relations are documented on the binary relations page

Special cases

ToDo: describe/document the following: