OpenCog

RelEx

From OpenCog

RelEx is an English-language semantic relationship extractor, built on the Carnegie-Mellon link parser. It can identify subject, object, indirect object and many other dependency relationships between words in a sentence; it generates dependency trees. In this sense, it implements some of the ideas of Hudson's Word Grammar. It can also provide part-of-speech tagging, noun-number tagging, verb tense tagging, gender tagging, and so on. Relex includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm. Optionally, it can use GATE for entity detection.

RelEx also provides semantic relationship framing, similar to that of FrameNet.

Relex is a part of OpenCog, an open-source artificial general intelligence project. See also OpenCog technical information.

Contents

Overview

Perhaps the easiest way to get a flavor of RelEx is to show its output. Below follows a parse of two sentences: "Alice looked at the cover of Shonen Jump. She decided to buy it." Parts of this output will be familiar to users of the Link Grammar Parser. The second part is generated by RelEx, and provides specific semantic markup for the words and the relationships between the words in the sentence. Notice, for example, that it is the cover that is being looked at, and that the subject doing the looking is Alice. This example has part-of-speech tagging suppressed, so as not to clutter the output, however, verb tense and noun-number tagging is enabled. Finally, at the end, notice the listing of antecedent candidates: "She" refers to "Alice", and "it" refers either to "Shonen Jump" or to "cover". This output is generated by an implementation of the Hobbs algorithm for pronoun (anaphora) resolution.

Alice looked at the cover of Shonen Jump.

====

Parse 1 of 2

====


(S (NP Alice) (VP looked (PP at (NP (NP the cover) (PP of (NP Shonen Jump))))) .)



    +--------------------------------Xp-------------------------------+
    |                          +---Js---+     +--------Jp-------+     |
    +----Wd----+----Ss---+-MVp-+  +--Ds-+--Mp-+      +-----A----+     |
    |          |         |     |  |     |     |      |          |     |
LEFT-WALL Alice[?].n looked.v at the cover.n of Shonen[?].a Jump[?].n . 


======
at(look, cover)
_subj(look, Alice )
tense(look, past)
of(cover, Jump)
DEFINITE-FLAG(cover, T)
noun_number(cover, singular)
_amod(Shonen, Jump)
DEFINITE-FLAG(Shonen, T)
noun_number(Shonen, singular)
DEFINITE-FLAG(Jump, T)
noun_number(Jump, singular)
DEFINITE-FLAG(Alice , T)
gender(Alice , feminine)
noun_number(Alice , singular)
person-FLAG(Alice , T)

======
She decided to buy it.

====

Parse 1 of 2

====

(S (NP She) (VP decided (S (VP to (VP buy (NP it))))) .)

    +---------------Xp--------------+
    +--Wd--+--Ss--+--TO--+-I-+-Ox-+ |
    |      |      |      |   |    | |
LEFT-WALL she decided.v to buy.v it . 

======

_subj(decide, she_2)
_to-do(decide, buy)
tense(decide, past)
_subj(buy, she_2)
_obj(buy, it_1)
tense(buy, infinitive)
HYP(buy, T)
DEFINITE-FLAG(it_1, T)
gender(it_1, neuter)
PRONOUN-FLAG(it_1, T)
DEFINITE-FLAG(she_2, T)
gender(she_2, feminine)
noun_number(she_2, singular)
PRONOUN-FLAG(she_2, T)


======


Antecedent candidates:
_ante_candidate(it_1, cover) {0}
_ante_candidate(it_1, Jump) {1}
_ante_candidate(she_2, Alice ) {0}

The markup used above is discussed in greater detail on the RelXML page.

Semantic Frame Output

Semantic framing or normalization provides a higher-level, more abstract, but semantically more tractable description of the parsed sentence. An example of the semantic frame output, for the sentence "Alice looked at the cover of Shonen Jump.", is shown below. Note, for example, the identification of "the cover of Shonen Jump" as being a part-whole relationship, in that the cover was identified as being a part of the whole thing. The goal of such framing is to assist cognitive reasoning; rather than requiring a large common-sense database to deduce that the covers of magazines are a part of a magazine, this information can be directly inferred from the sentence itself, based on the linguistic structure, and a relatively small set of framing rules.

^1_Seeking:Cognizer_agent(look,Alice)
^1_Perception_active:Place(look,cover)
^1_Appearance:Ground(look,cover)
^1_Seeking:Ground(look,cover)
^1_Perception_active:Perceiver_agentive(look,Alice)
^1_State_of_entity:Parameter(Alice,cover)
^1_State_of_entity:State(Alice,look)
^1_Partitive:Subset(of_3,cover)
^1_Partitive:Subset(of,cover)
^1_Partitive:Group(of_2,Shonen_Jump)
^1_Partitive:Group(of_1,Shonen_Jump)
^1_Partitive:Group(of_3,Shonen_Jump)
^1_Partitive:Subset(of_2,cover)
^1_Partitive:Subset(of_1,cover)
^1_Partitive:Group(of,Shonen_Jump)
^1_Physical_entity:Constituents(of,Shonen_Jump)
^1_Physical_entity:Entity(of,cover)
^2_Inheritence:Group(of,Shonen_Jump)
^2_Inheritence:Instance(of,cover)
^1_Being_named:Entity(Alice)
^1_Seeking:Place(look,cover)
^1_Part_whole:Whole(of_1,Shonen_Jump)
^1_Part_whole:Part(of_1,cover)
^1_Part_whole:Part(of,cover)
^1_Part_whole:Whole(of,Shonen_Jump)
^2_Inheritence:Inheritor(of_3,Shonen_Jump)
^2_Inheritence:Inheritor(of_2,Shonen_Jump)
^2_Inheritence:Quality(of_3,cover)
^2_Inheritence:Quality(of_1,cover)
^2_Inheritence:Inheritor(of,Shonen_Jump)
^2_Inheritence:Quality(of_2,cover)
^2_Inheritence:Inheritor(of_1,Shonen_Jump)
^2_Inheritence:Quality(of,cover)
^1_Temporal_colocation:Event(past,look)
^1_Temporal_colocation:Time(past,look)
^1_Filling:Subregion(cover,Shonen_Jump)
^1_Appearance:Phenomenon(look,Alice)
^1_Transitive_action:Agent(look,Alice)


The framing rules are specified as simple IF..THEN rules, and are evaluated using a simple forward-chaining reasoner. The forward-reasoner code is home-grown; it might make sense to replace it by some generic forward-reasoning code, ideally, by one of the smallest, simplest yet still widely supported and popular open source rule engines. Anything implementing JSR 94 - Java Rule Engine seems like a good bet.

Source Code and Development

Source code, development coordination, and bug reporting for RelEx is available at the RelEx Launchpad Site. Developers should join discussions at #opencog on IRC.freenode.net.

All relex discussion should take place on the Link Grammar mailing list.

Source code is written in the Java programming language. The code is released under the Apache v2 license.

The RelExIdeas page discusses possible future projects.

Performance profiling

An ongoing project is to determine where the CPU time goes:

"YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler."

Documentation

Existing Relex documentation includes: