OpenCog

RelEx Dependency Relationship Extractor

From OpenCog

(Redirected from RelEx)

RelEx, a narrow-AI component of OpenCog, is an English-language semantic dependency relationship extractor, built on the Carnegie-Mellon Link Grammar parser. It can identify subject, object, indirect object and many other syntactic dependency relationships between words in a sentence; it generates dependency trees, resembling those of dependency grammars, and specifically, those of Dekang Lin's MiniPar and the Stanford parser. It accomplishes this by applying a sequence of rules, based on the local context, and thus resembles constraint grammar in its implementation. In this sense, it implements some of the ideas of Hudson's Word Grammar.

However, unlike other dependency parsers, RelEx attempts a greater degree of semantic normalization: for questions, comparatives, entities, and for prepositional relationships, whereas other parsers (such as the Stanford parser) stick to a literal presentation of the syntactic structure of text. For example, RelEx pays special attention to determining when a sentence is hypothetical, and to isolating the query variables from a question. Both of these aspects are intended to make RelEx well-suited for question-answering and semantic comprehension/reasoning systems. In addition, RelEx makes use of feature tagging, to tag words with part-of-speech, noun-number, verb-tense, gender, etc.. As of this writing, RelEx parses text nearly four times faster than the Stanford parser; and it now provides a "compatibility mode", wherein it can generate the same relations as the Stanford parser.

Relex also includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm. Optionally, it can use GATE for entity detection. RelEx also provides semantic relationship framing, similar to that of FrameNet. The SegSim and NLGen2 projects aim to reverse the flow: to generate natural-language output, based on dependency-grammar-like input.

RelEx is a part of OpenCog, an open-source artificial general intelligence project. See also OpenCog technical information.

Contents

Overview

Perhaps the easiest way to get a flavor of RelEx is to show its output. Below follows a parse of two sentences: "Alice looked at the cover of Shonen Jump. She decided to buy it." Parts of this output will be familiar to users of the Link Grammar Parser. The second part is generated by RelEx, and provides specific feature markup for single words and the dependency relationships between the words in the sentence. Notice, for example, that it is the cover that is being looked at, and that the subject doing the looking is Alice. This example has part-of-speech tagging suppressed, so as not to clutter the output, however, verb tense and noun-number tagging is enabled. Finally, at the end, notice the listing of antecedent candidates: "She" refers to "Alice", and "it" refers either to "Shonen Jump" or to "cover". This output is generated by an implementation of the Hobbs algorithm for pronoun (anaphora) resolution. research paper

Alice looked at the cover of Shonen Jump.

====

Parse 1 of 2

====


(S (NP Alice) (VP looked (PP at (NP (NP the cover) (PP of (NP Shonen Jump))))) .)



    +--------------------------------Xp-------------------------------+
    |                          +---Js---+     +--------Jp-------+     |
    +----Wd----+----Ss---+-MVp-+  +--Ds-+--Mp-+      +-----A----+     |
    |          |         |     |  |     |     |      |          |     |
LEFT-WALL Alice[?].n looked.v at the cover.n of Shonen[?].a Jump[?].n . 


======
at(look, cover)
_subj(look, Alice )
tense(look, past)
of(cover, Jump)
DEFINITE-FLAG(cover, T)
noun_number(cover, singular)
_amod(Shonen, Jump)
DEFINITE-FLAG(Shonen, T)
noun_number(Shonen, singular)
DEFINITE-FLAG(Jump, T)
noun_number(Jump, singular)
DEFINITE-FLAG(Alice , T)
gender(Alice , feminine)
noun_number(Alice , singular)
person-FLAG(Alice , T)

======
She decided to buy it.

====

Parse 2 of 2

====

(S (NP She) (VP decided (S (VP to (VP buy (NP it))))) .)

    +---------------Xp--------------+
    +--Wd--+--Ss--+--TO--+-I-+-Ox-+ |
    |      |      |      |   |    | |
LEFT-WALL she decided.v to buy.v it . 

======

_subj(decide, she_2)
_to-do(decide, buy)
tense(decide, past)
_subj(buy, she_2)
_obj(buy, it_1)
tense(buy, infinitive)
HYP(buy, T)
DEFINITE-FLAG(it_1, T)
gender(it_1, neuter)
PRONOUN-FLAG(it_1, T)
DEFINITE-FLAG(she_2, T)
gender(she_2, feminine)
noun_number(she_2, singular)
PRONOUN-FLAG(she_2, T)


======


Antecedent candidates:
_ante_candidate(it_1, cover) {0}
_ante_candidate(it_1, Jump) {1}
_ante_candidate(she_2, Alice ) {0}

The markup used above is discussed in greater detail on the RelXML page.

Semantic Frame Output

Semantic framing or normalization provides a higher-level, more abstract, but semantically more tractable description of the parsed sentence. An example of the semantic frame output, for the sentence "Alice looked at the cover of Shonen Jump.", is shown below. Note, for example, the identification of "the cover of Shonen Jump" as being a part-whole relationship, in that the cover was identified as being a part of the whole thing. The goal of such framing is to assist cognitive reasoning; rather than requiring a large common-sense database to deduce that the covers of magazines are a part of a magazine, this information can be directly inferred from the sentence itself, based on the linguistic structure, and a relatively small set of framing rules.

^1_Seeking:Cognizer_agent(look,Alice)
^1_Perception_active:Place(look,cover)
^1_Appearance:Ground(look,cover)
^1_Seeking:Ground(look,cover)
^1_Perception_active:Perceiver_agentive(look,Alice)
^1_State_of_entity:Parameter(Alice,cover)
^1_State_of_entity:State(Alice,look)
^1_Partitive:Subset(of_3,cover)
^1_Partitive:Subset(of,cover)
^1_Partitive:Group(of_2,Shonen_Jump)
^1_Partitive:Group(of_1,Shonen_Jump)
^1_Partitive:Group(of_3,Shonen_Jump)
^1_Partitive:Subset(of_2,cover)
^1_Partitive:Subset(of_1,cover)
^1_Partitive:Group(of,Shonen_Jump)
^1_Physical_entity:Constituents(of,Shonen_Jump)
^1_Physical_entity:Entity(of,cover)
^2_Inheritence:Group(of,Shonen_Jump)
^2_Inheritence:Instance(of,cover)
^1_Being_named:Entity(Alice)
^1_Seeking:Place(look,cover)
^1_Part_whole:Whole(of_1,Shonen_Jump)
^1_Part_whole:Part(of_1,cover)
^1_Part_whole:Part(of,cover)
^1_Part_whole:Whole(of,Shonen_Jump)
^2_Inheritence:Inheritor(of_3,Shonen_Jump)
^2_Inheritence:Inheritor(of_2,Shonen_Jump)
^2_Inheritence:Quality(of_3,cover)
^2_Inheritence:Quality(of_1,cover)
^2_Inheritence:Inheritor(of,Shonen_Jump)
^2_Inheritence:Quality(of_2,cover)
^2_Inheritence:Inheritor(of_1,Shonen_Jump)
^2_Inheritence:Quality(of,cover)
^1_Temporal_colocation:Event(past,look)
^1_Temporal_colocation:Time(past,look)
^1_Filling:Subregion(cover,Shonen_Jump)
^1_Appearance:Phenomenon(look,Alice)
^1_Transitive_action:Agent(look,Alice)


The framing rules are specified as simple IF..THEN rules, and are evaluated using a simple forward-chaining reasoner. The forward-reasoner code is home-grown; it might make sense to replace it by some generic forward-reasoning code, ideally, by one of the smallest, simplest yet still widely supported and popular open source rule engines. Anything implementing JSR 94 - Java Rule Engine seems like a good bet.

RelEx Based Language Generation

There are two systems for language generation based on RelEx. The overall idea is named SegSim, and is implemented in two systems: NLGen and NLGen2. The latter is described in greater detail in the paper by Blake Lemoine, NLGen2: A Linguistically Plausible, General Purpose Natural Language Generation System (2009).

Source Code and Development

Source code, development coordination, and bug reporting for RelEx is available at the RelEx Launchpad Site. Developers should join discussions at #opencog on IRC.freenode.net.

All relex discussion should take place on the Link Grammar mailing list.

Source code is written in the Java programming language. The code is released under the Apache v2 license.

The RelExIdeas page discusses possible future projects.

Documentation, Algorithm Overview

RelEx borrows some algorithmic ideas from constraint grammars, but applies them in a more abstract setting. Each incoming sentence is represented as a graph, with the words of the sentence representing verticies in the graph. Edges, carrying labels, are used to represent the features of the words, and the structure of the sentence. Initially, the graph is merely a list of words, with edges (labeled "left" and "right") used to indicate the sequence of the words. Parsing, using the Link Grammar parser, is performed; the output of the parse is a set of (labeled) edges indicating the syntactical relationships between words.

From this point on, the main rule engine in RelEx takes over. The engine applies a set of pattern-matching rules to the graph; if the predicate part of the rule matches, then the graph is transformed according to the implicand of the rule. Thus, for example, one rule states that if there is an edge labeled "SFI" (obtained from link grammar), then the word on the left is to be tagged as a verb. The tagging is done by adding an edge called "POS", to a vertex called "verb". After each step, the graph (usually) becomes richer, and more adorned with feature labels and relationship structures, although some rules can also prune the graph. This process, of applying a sequence of rules, resembles the process used in constraint grammars; yet, it differs from constraint grammars in that it operates on a graphical representation, rather that simple sets of tags. This last difference allows RelEx to apply progressively more abstract transformations in analyzing text. The general idea of performing pattern recognition and using it to transform (hyper-)graphs is one of the central concepts within OpenCog; this is why RelEx is a part of OpenCog. The page on sentence algorithms provides a more detailed description of the operation of the rule engine.

The current graph-transformation rules are hand-generated (i.e. designed by linguists). A focus of research interest is to somehow automatically learn and refine these rules by means of corpus statistics. The LexAt project takes baby-steps in this direction. An alternate direction is being taken via OpenCog, using feedback from deductive reasoning to refine parsing. Some early, positive results indicate that very high-speed word-sense disambiguation (WSD) is possible; see blog entry for details.

Relations and Features

The relation and feature markup generated by RelEx resembles that of other projects that generate dependency trees, including Dekang Lin's MiniPar, and the Stanford parser.

The above relations are encoded in the main RelEx 'feature graph', as a set of labeled edges and verticies. There are a variety of different output modules which traverse this graph, and generate different styles of output. It is presumed that users of RelEx would create their own custom output module tailored to suit their needs. This can be easily done by copying an existing output module, and tailoring it to suit one's needs.

Other documentation

Documentation for other output formats, etc.

Papers:

History, Authors

The original version of RelEx was created by Mike Ross in 2005. As of 2008, it is maintained by Linas Vepstas. The following folks have made significant contributions (in historical order):

  • Murilo Queiroz
  • Borislav Iordanov
  • Evgenii Philippov
  • Fabricio Silva

Performance profiling

YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler.

YJP was used to identify performance bottlenecks in RelEx. One example is morphological analysis performance, which was improved using a cache.