Ideas
From OpenCog
See also StudentApplicationTemplate, Development, Projects, Publications, and GSoCProjects2008 for background reading.
Ideas listed here may be taken up as projects by anyone with the necessary skill and motivation (the ideas here are not just for GSoC students!).
Wikipedia good-article-quality edits to any page on the entire wiki are most welcome!
|
Google Summer of Code 2009 students: Accepted projects will be listed at GSoCProjects2009. Please create a WikiWord for your accepted project and use the new wiki page to keep a record of your work, e.g. a collection of brief weekly summaries; linking to a blog is also okay. |
OpenCog projects are inter-related in many ways, although they vary considerably in their exploration of various sub-fields of computer science, programming languages used, size and scope, and other properties. OpenCog teams overlap and compliment each other to varying degrees. The formation of particular projects and teams are influenced primarily by the goals and needs of OpenCog as an integrated and coherent system and for the performance of specific larger project goals, such as building a natural language conversational system.
Most tasks are given difficulty labels:
- Relatively straightforward AI R&D, though not easy
- Pretty difficult
- Rather difficult
(Note that these labels refer only to the AI aspects of the task; there may also be difficult software engineering or systems integration aspects, which haven't been labeled.)
If you add an idea as a Blueprint in Launchpad (under the respective project), please append a link to the blueprint after the idea text on this page.
OpenCog Framework
- Language: C++ (uses Boost and templates)
- Code: https://code.launchpad.net/opencog
- List: http://groups.google.com/group/opencog-developers?hl=en
- License: AGPLv3 + linking exception
The framework projects require no prior experience with AI, NLP or robotics; instead, they require strong coding skills, and a working knowledge of system architecture principles.
Distributed and persistent AtomSpace
Provide AtomSpace persistence, by hooking to an open-source BigTable equivalent such as Hypertable or HBase. The AtomTable (the implementation of the AtomSpace) would then be relegated to an in memory AtomSpace cache of the distributed versions.
Lack of a completed, tested back-end capable of supporting large-scale usage is the most important and immediate roadblock to using opencog in a large-scale, distributed manner.
This project requires little of no acquaintance with general AI or NLP concepts; instead, it requires strong knowledge of distributed processing, network, locking, transactional and kernel-type operating system principles. This is an ideal project for students with strong coding skills and theoretical comp-sci knowledge.
It should be relatively easy to just "hook things up". It is considerably harder to set things up so that one gets high-performance, and gets a solution that could scale to thousands of connections at a time (although we'll settle for dozens to begin with!). This is where the need for system architecture/distributed processing skills kicks in.
- The Launchpad blueprint will soon provide additional details.
Python Language Bindings
The creators of OpenCog have interest in providing language bindings to a host of languages, so that MindAgents can be written in numerous languages. Python is extremely popular and so Python should be one of the first to be tackled. If using Swig, then it may not be much extra effort to provide other languages, also.
Implement Python language bindings for the OpenCog Framework's API.
AtomSpace Visualizer
It can be hard to understand what is going on within OpenCog, and visualizing the hypergraph can be a big help. Unfortunately, real-world hypergraphs run from 100K to 10M or more atoms, and thus overwhelm most graphing/visualization software. Tulip has been tried, and although it claims to visualize graphs with up to 1M nodes, in practice, it seems to choke on graphs with more than 5K nodes.
Ubigraph is close to being useful, but it seems to be an essentially dead project and is also NOT open source. See the existing ubigraph dynamic library written by Jared Wigmore and Joel Pitt to see how the AtomSpace is currently connected to Ubigraph.
Guess has not been tried.
JSON-RPC Bindings
Provide JSON-RPC calls to the AtomSpace methods and server requests system. It should designed in such a way so that the system can be extended relatively easily for XML-RPC and perhaps even Google's Protocol buffers. Requests to support should include:
- Create and delete atoms.
- Retrieve sequences of atoms based on search criteria (getHandleSet equivalent).
- Get the incoming and outgoing sets.
- Modify importance and truth values.
- All the cogserver requests (means dynamically adding new JSON methods when modules that provide new requests are loaded).
- Request for all available methods.
Remote shell
Connected to the above (possibly in the same project if there was sufficient time). Create a shell environment for working with OpenCog. The Python shell (or a variant such as IPython) seems like it'd be ideal in this way, as it already has XML-RPC and undoubtedly JSON-RPC.
Performance measurement suite
Many contemplated changes to the opencog infrastructure have a real impact on execution time, and the amount of memory used. A performance suite would collect into one place a large variety of different test cases, instrument them properly so as to measure speed and memory usage, and then report the results, in a completely automated fashion.
OpenCog modules
The ideas listed in this section are all related to parts of OpenCog which are to some extent projects in their own right.
Natural Language Processing (NLP)
The overall OpenCog NLP pipeline is described here. It has many pieces, including the Link Grammar parser, and RelEx as pre-processing stages. Output is fed to OpenCog for reasoning. The projects below address various aspects of the pipeline. Almost all parts of the pipeline merit improvement.
- Code: https://code.launchpad.net/relex/
- List: http://groups.google.com/group/link-grammar (mentors: Linas and Murilo)
- License: BSD, Apache 2.0
- Languages: C, C++, Java, Scheme, Perl
RelEx is an English-language semantic relationship extractor, built on the Carnegie-Mellon link parser. It can identify subject, object, indirect object and many other relationships between words in a sentence. It can also provide part-of-speech tagging, noun-number tagging, verb tense tagging, gender tagging, and so on. Relex includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm. Optionally, it can use GATE for entity detection. RelEx also provides semantic relationship framing, similar to that of FrameNet.
The output from RelEx is a hypegraph of Nodes and Links, which are input into OpenCog, and may then be processed in various ways. In addition, the LexAt (lexical attaction) package provides many scripts and database tools to collect statistical information on parsed sentences.
PLN Inference on extracted semantic relationships
Currently RelEx takes in a sentence, and outputs a set of logical relationships expressing the semantics of the sentence. It is possible to take the logical relationships extracted from multiple sentences, and combine them using a logical reasoning engine, to see what conclusions can be derived. Thus, for example, the English-language input Aristotle is a man. Men are mortal should allow the deduction of Aristotle is mortal.
Some prototype experiments along these lines were performed in 2006, using sentences contained in PubMed abstracts. See paper. But no systematic software approach was ever implemented.
These experiments could be done in a variety of rule engines, including the Probabilistic Logic Networks engine, as well as more standard crisp rule engines such as SWIRLS.
This project is appropriate for a student who is interested in both computational linguistics and logical inference, and has some knowledge of predicate logic.
A simple, detailed example of language-based inference using RelEx and PLN is given here: Image:RelEx PLN Example Inference.pdf.
Related working notes can be found in the OpenCog source code tree, in the directory opencog/nlp/seme/README and opencog/nlp/triples/README.
See also NLP-PLN-NLGen pipeline.
Making RelEx output Stanford Dependency relationships
The output given by RelEx as you can see at
http://174.129.121.70:8180/disco-web
is somewhat similar to the output given by SD as demonstrated in
http://nlp.stanford.edu/pubs/dependencies-coling08.pdf
but there are numerous technical differences. It would be interesting to modify RelEx to give it the option of outputting parses in SD mode. Basically this would enable using the link parser as a SD parser.
It appears that the Genia tagger already outputs part-of-speech tags that are in the same format as that of the Stanford parser; it could be used as a source of tags which are then fine-tuned/fixed by RelEx.
This would let us use the Penn Treebank as a training corpus for parse ranking, since the Treebank has been mapped into SD format already; and would have some other advantages also in terms of letting us use the Stanford NL group's tools to help with our own NLP work.
It would also require us to carefully search-and-replace through the RelEx2Frame rules to make them work with SD format, and to tweak NLGen.
We might wind up with an extended version of SD as the link parser may handle some phenomena that SD does not.
Statistically similar expressions for question answering
Proper language-based inference requires that similar expressions be recognized as such. For example, X is the author of Y and X wrote Y are more or less synonymous phrases. Lin and Pantel describe how to automatically discover such similar phrases. (See Dekang Lin and Patrick Pantel. 2001. "Discovery of Inference Rules for Question Answering." Natural Language Engineering 7(4):343-360.)
The goal of this task would be to not only collect the statistical information needed to identify such similar phrases, but also to integrate them appropriately into the OpenCog reasoning framework. The first step provides excellent hands-on experience in corpus linguistics; the second step shows how such ideas interface with reasoning systems.
Improved reference resolution
Reference resolution (anaphora resolution) is the problem of determining what words like "it", "him" and "her" refer to. There are several ways in which this problem might be attacked.
1) Statistical methods: The current RelEx implementation uses the Hobbs algorithm for this, which is an intelligent but crude mechanism that achieves about 60% accuracy. By combining Hobbs algorithm with statistical measures (one of which is sometimes called the Hobbs score), it should be possible (according to literature results) to get up to 85% accuracy or so. Appropriate for anyone who is interested in getting some hands-on experience with statistical corpus linguistics.
2) Reasoning: Sentences commonly make assertions, which can be checked for accuracy. So, for example: Jules went to Paris. He saw many things there. Does he refer to Jules, or to Paris? Can Jules see things? Can Paris see things? Does there refer to Paris, or to Jules? Can one see many things in Jules? Can one see many things in Paris? This approach is more powerful than a statistical approach, but requires establishing a large knowledgebase, and performing reasoning there-on. It is expected that PLN would be used for reasoning.
Classifying words by grammatical usage, learning new grammar rules.
One of the datasets associated with OpenCog is a large database associating words with their grammatical usage; another database contains triples of (word, grammatical usage, word sense). These databases contain millions of entries, and are largely unexplored. They could be data-mined for clusters of words that are used in grammatically similar ways. This project would require evaluating and selecting appropriate open-source clustering software. The result of applying clustering would include identifying how new, unknown words should be treated grammatically (is the unknown word being used as a noun, or a verb?)
Clusters might distinguish between types of words that link-grammar treats as being the same (such as most adjectives, adverbs), but are, in fact, used quite differently by English speakers. For example, the parser treats acetylene and adroitness as nouns belonging to the same class; yet clearly one would never say the adroitness exploded. Perhaps clustering could help split this class of words into smaller, more refined classes. Perhaps clustering might also distinguish different semantic senses: I tend to sheep and I tend to agree use the word tend in two dramatically different senses. Can grammatical clustering distinguish between these senses?
Some words may be in classes that are too narrow: parses fail because the parser does not know that the word can be used in a broader grammatical context. By examining how a word is used, perhaps clustering could identify such cases as well.
Learning simple grammars
RelEx uses the CMU Link Grammar as its underlying English-language parser. Link Grammar's weak point is short sentences, simple commands, directives and the like; sentences which typically occur in chat rooms. The project is to tinker with with the automatic acquisition of language via some learning mechanism or another.
Jianfeng Gao,Hisami Suzuki, (2003) "Unsupervised Learning of Dependency Structure for Language Modeling" describe a method for learning dependency grammars. The parser currently used by OpenCog, the Link-Grammar parser, has many similarities to dependency grammars. The paper John Lafferty, Daniel Sleator, and Davy Temperley. 1992. "Grammatical Trigrams: A Probabilistic Model of Link Grammar." Proceedings of the AAAI Conference on Probabilistic Approaches to Natural Language, October, 1992. describes statistical learning algos to generate Link Grammar rules. See also the link grammar bibliography for details.
Yuret's(1998) algo tries to create a dependency tree by computing mutual information (MI) between word pairs. The tree is discovered by computing the maximum spanning tree of the MI between all word pairs. There is an alternate approach: true, hierarchical clustering. In hierarchical clustering, one creates an MI-based metric (MI alone is not a metric), and applies cluster analysis techniques. To get hierarchy, one also looks for and computes metric measures between clusters i.e. one computes the MI between a third word, and a word-pair, instead of just the third word, and the head-word of the pair phrase.
It is not clear how adaptable these algorithms would be to the short, frequently ungrammatical sentences seen in chatrooms; and what mechanisms they suggest for keeping such learning from garbaging up the correct parses of more complex sentences. It is quite possible that link grammar needs new link types which would be used only in short sentences. (!) The idea of learning new link types seems unexplored.
The project is then to learn new Link Grammar links and rules, suitable for parsing the kind of speech often seen in chat rooms.
Natural Language Generation
We have a Java software system called NLGen that, given a set of RelEx-like relations, generate a syntactically correct English-language sentence.
It works OK on short sentences but needs to be improved significantly to be able to handle sentences with complex phrasal or clausal structure. Also, some statistical linguistics should be inserted to allow it to use observed word frequencies to guide its sentence generation.
The basic idea underlying NLGen's algorithm is described here: SegSim
Other language generation approaches include Markus Guhe's (2003) "Incremental conceptualisation for language production"
Statistical, semantically guided parse ranking
Right now, the Link Grammar parser spits out between 1 and 100+ parses for each sentence you feed it. While all are syntactically correct, many are semantically wacky. The goal of parse ranking is to assign scores, so as to identify the most likely parse.
Several approaches are possible:
- Likelihood of seeing certain link-grammar disjuncts
- Likelihood of seeing certain word-pair,link combinations
- Likelihood of seeing certain dependency relations.
- Least-squares method described by Tsivtsivadze et al.
- Locality convolution method of Tsivtsivadze et al.
Large databases of statistics for all three have been collected. Some basic code for hooking up the first two, directly into link-grammar, has been created. Hooking these databases into the SAT solver remains unfinished. The third database would need to be hooked into RelEx, this has not been done. The quality of the resulting parse ranking has not been evaluated. Is one approach better than the other? How should the quality of the results be evaluated? Part of the project would be creating a framework for measuring, evaluating, the quality of each ranking technique.
A very brief mathematical review of the idea behind ranking is given on page 5 of Dekang Lin and Patrick Pantel. 2001. "Discovery of Inference Rules for Question Answering." Natural Language Engineering 7(4):343-360. The equations there express ranking in terms of conditional probabilities.
Other ranking techniques may be possible as well; these remain unexplored. a tool that can evaluate the quality of a parse should allow different techniques to be compared -- there is room for experimentation, as well as plenty of implementation work to be done. Appropriate for anyone who is interested in getting some hands-on experience with statistical corpus linguistics.
All programming would be in C, Java, SQL (SQLite, Postgres).
Link Grammar Parser improvements
A number of improvements to the Link Grammar parser would be useful.
- Implement left-to-right, limited-window search through parse space. The parser currently examines sentences as a whole, which means that the parsing of long sentences becomes very slow (approximately as N^3, with N the number of words in the sentence). Implementing a "window" to limit searches for connections between distant words should dramatically improve parse performance.
- Addition of a spelling checker. If words are not recognized, or a sentence doesn't parse, its perhaps because words are mis-spelled. This is a relatively small project, and should be combined with other work.
Web-Scale Semantic Parsing
With the goal of creating an open semantic map and a web-scale corpus of natural-language-parsed text, extend an existing crawler with RelEx (perhaps the Apache Nutch crawler, Grub distributed web crawler or the Droids Framework) to advance the design & engineering of a large-scale system to extract and process the textual content of web pages and other content, and upload the results to an open and shared repository. The flexible RelEx_compact_output is already designed for representing NL-parses; the semantic map data format is TBD.
Part of the project could include the integration of a spelling checker with link-grammar -- alternate spellings are tried, if a sentence cannot be parsed.
Explore Landmark Transitivity in Link Grammar
The Link Grammar uses a constraint of "planar graphs" (i.e. no link crossings) to rule out unreasonable parses. It seems that it might be possible to replace this rule by the notion of "Landmark Transitivity" taken from Hudson's Word Grammar. The basic idea is this:
Each Link Grammar link is given a parent-child relationship: one end of the link is the parent, the other the child. Thus, for example, given a noun to noun-modifier link, the noun is the parent of the link. Then, parents are landmarks for children. Transitivity (in the mathematical sense of "transitive relation") is applied to these parent-child relationships. Specifically, the no-links-cross rule is replaced by two landmark transitivity rules:
- If B is a landmark for C, then A is also a type-L landmark for C
- If A is a landmark for C, then B is also a landmark for C
where type-L means either a right-going or left-going link.
Ben hypothesizes that adding Landmark transitivity might be able to eliminate most or all of Link Grammar's post-processing rules. See Ben's PROWL grammar for details. See also Natural Language Processing, below.
Word Grammar Parsing
Implement a Word Grammar parser that utilizes the link grammar dictionary, and utilizes PLN inference for semantic biasing of the parsing process. This gives a strong impression of being an approach to NL comprehension that is suitable for general intelligence.
Inferring Semantic Mapping Rules
Use PLN to combine the hand-coded semantic normalization rules that exist in the RelExToFrame rule-base, to form new rules. Also, to generalize the rules to other words not currently covered by them, via combining the rules with semantically-based concept-similarity measures. (1)
MOSES
- Language: advanced C++
- Code: https://code.launchpad.net/opencog (opencog/learning/moses)
- Earlier versions: http://code.google.com/p/moses/
- List: http://groups.google.com/group/moses-users (mentors: Moshe and Nil)
- License: Apache 2.0
Meta-optimizing semantic evolutionary search (MOSES) is a new approach to program evolution, based on representation-building and probabilistic modeling. MOSES has been successfully applied to solve hard problems in domains such as computational biology, sentiment evaluation, and agent control. Results tend to be more accurate, and require less objective function evaluations, in comparison to other program evolution systems. Best of all, the result of running MOSES is not a large nested structure or numerical vector, but a compact and comprehensible program written in a simple Lisp-like mini-language. More at http://metacog.org/doc.html.
Extended MOSES to encompass primitive recursive functions using fold
A good GSoC project would be to implement in Reduct/MOSES the algebra of foldl, as Moshe hints at the bottom of page 5 of
http://www.agi-09.org/papers/paper_69.pdf
and as explained in detail in the paper
G. Hutton. A tutorial on the universality and expressiveness of fold. Journal of Functional Programming, 1999.
which is at
http://www.cs.nott.ac.uk/~gmh/fold.pdf
Higher-order Programmatic Constructs
A very important project, appropriate for a student with some functional programming background, is to extend MOSES to handle higher-order programmatic constructs, including variable expression. Our design for this involves Sinot's formalism of "director strings as combinators," and there is opportunity for the student to assist with working out the details of the design as well as the implementation. This can be done many ways, including using combinatory logic or lambda calculus. The route that seems best at the moment would be to use Sinot's formalism of "director strings as combinators." Much of the work here is in Reduct and representation-building, which would be useful for both MOSES and Pleasure. (2)
Pleasure
Complete implementation and then test and explore the Pleasure Algorithm for program learning started last year by Alesis Novik. http://opencog.org/wiki/MOSES:_the_Pleasure_Algorithm
Transfer Learning
Causing MOSES to generalize across problem instances, so what it has learned across multiple problem instances can be used to help prime its learning in new problem instances. This can be done by extending the probabilistic model building step to span multiple generations, but this poses a number of subsidiary problems, and requires integration of some sort of sophisticated attention allocation method into MOSES to tell it which patterns observed in which prior problem instances to pay attention to. (2)
Arbitrarily Complex Program Learning
More on the previous project suggestion: The motivation for the above is to allow MOSES to learn arbitrarily complex programs. For instance, we would like it to be able to easily learn nlogn sorting algorithms without any fancy data preparation or other "cheating." It is possible that integrating Sinot's formalism into MOSES will allow effective learning of moderately complex programs using recursive control, which is something no one has achieved before and which is of critical importance in automated program learning.
Action-Sequences Handling
The current version of MOSES does not elegantly or efficiently handle the learning of programs involving long sequences of actions. This is problematic for applications involving the control of robots or virtual agents. So, an important project is the extension of the Reduct and representation-building components of MOSES to effectively handle action-sequences. This work will be testable via using MOSES to control agents in virtual worlds such as Multiverse or CrystalSpace.
Improved hBOA
MOSES consists of four critical aspects: deme management, program tree reduction, representation-building, and population modeling. For the latter, the hBOA algorithm (invented by Martin Pelikan in his 2002 PhD thesis) is currently used, but we've found it not to be optimal in this context. So there is room for experimentation in replacing hBOA with a different algorithm; for instance, a variant of simulated annealing has been suggested, as has been a pattern-recognition approach similar to LZ compression. A student with some familiarity with evolutionary learning, probability theory and machine learning may enjoy experimenting with alternatives to hBOA so as to help turn MOSES into a super-efficient automated program learning framework. It already works quite well, dramatically outperforming GP, but we believe that with some attention to improving the hBOA component it can be improved dramatically.
Dimensional Embedding for Improved Program Learning
Suppose one is using MOSES (or some related technique) to learn a program tree that contains nodes referring to semantic knowledge in a large knowledge base (e.g. a program tree that contains terms like "cat", "walk" etc. that represent concepts in OpenCog's AtomTable).
Then, mutating these nodes (for "representation building", in MOSES lingo) requires some special mechanism -- for instance, one wants to mutate "cat" into "some other concept that is drawn from a Gaussian of specified variance, centered around 'cat'". One straightforward way to do this is to embed the concepts in the semantic knowledge base in an n-dimensional space, and then use Gaussian distributions in this dimensional space to do mutation.
We have identified some good algorithms for dimensional embedding, but the coding needs to be done, and a bunch of fiddling will likely be required!
MOSES Evolution of Recurrent Neural Nets
In principle one can uses MOSES to evolve recurrent neural nets; but, in practice, the ComboReduct library used to reduce program trees to an elegant, hierarchical normal form will probably need some tweaking in order to give nice normalizations for recurrent NN's.
This should allow substantially better results than existing methods for GA evolution of neural nets.
PLN
- Language: advanced C++
- Code: TBD
- List: TBD
- License: TBD
Probabilistic Logic Networks (PLN) are a novel conceptual, mathematical and computational approach to uncertain inference. In order to carry out effective reasoning in real-world circumstances, AI software must robustly handle uncertainty. However, previous approaches to uncertain inference do not have the breadth of scope required to provide an integrated treatment of the disparate forms of cognitively critical uncertainty as they manifest themselves within the various forms of pragmatic inference. Going beyond prior probabilistic approaches to uncertain inference, PLN is able to encompass within uncertain logic such ideas as induction, abduction, analogy, fuzziness and speculation, and reasoning about time and causality.
Intensional reasoning
Implement and test intensional inference in PLN, and compare the results to data regarding human intensional inference (2)
Spatial and temporal reasoning
Implement spatial and temporal reasoning into PLN, using the Region Connection Calculus (for space) and Allen's Interval algebra (for time). (3)
Combining PLN and MOSES
Integrate MOSES-based supervised categorization into PLN, so that when PLN chaining hits a confusing point, it can launch MOSES to learn patterns in the members of the Atoms at the current end of the chain (which may then provide additional information useful in pruning). (1)
History-Guided Inference
Cause PLN's backward and forward chaining inference to utilize history -- so that an inference step is more likely to be taken if similar steps have been taken in similar instances. (1)
HypergraphDB
- Language: Java and C++
- Code: http://code.google.com/p/hypergraphdb/ (Java version)
- Code: https://launchpad.net/hypergraphdb (C++ version; no code yet)
- list: http://groups.google.com/group/hypergraphdb/ (mentors: Boris)
- License: LGPL (Java version), TBD (C++ port)
Originally HypergraphDB was intended to be the underlying representation and storage of the AtomSpace, but utilising a BigTable equivalent may be preferable to using the current version of HGDB. HypergraphDB would still be beneficial and possibly more efficient, but it would be a more involved project (and thus take longer) as it needs to be ported to C++ and made distributed.
OpenCog and BigTable integration
HGDB should be integrated with OpenCog as a persistent store.
The BerkeleyDB back end of HGDB should be replaced with BigTable or an open-source equivalent.
OpenCog Prime
Projects related to building and testing the OpenCogPrime design for an AGI.
- Language: advanced C++
- Code: TBD
- List: TBD
- License: AGPLv3
Concept Formation
Blending
Implement conceptual blending as a heuristic for combining concepts, with a fitness function for a newly formed concept incorporating the quality of inferential conclusions derived from the concepts, and the quality of the MOSES classification rules learned using the concept. (1)
Map formation
Implement "map formation," a process that uses frequent subgraph mining methods to find frequently co-occurring subnetworks of the AtomTable, and then embodies these as new nodes. This requires some extension and adaptation of current algorithms for frequent subgraph mining. It also requires functional attention allocation. (1)
Context formation
Implement context formation, wherein an Atom C is defined as a context if it is important, and restricting other Atoms A to the context of C leads to conclusions that are significantly different from A's default truth value. (2)
OpenCog Applications
projects relating to using OpenCog Prime or OpenCog Collective components for specific applications
- Language: advanced C++
(until bindings are created for other languages, such as Java, Python, Ruby, Lisp, etc.) - License: any approved FOSS license
Rex Proxy and OpenPetBrain
A proxy currently exists connecting OpenCog to the RealXTend virtual world, enabling OpenCog to control virtual dogs learning tricks in the virtual world.
However, there are various quirks related to the interaction between OpenCog and RealXTend, partly to do with fixable shortcomings in RealXTend itself.
So, this project is good for someone who wants to tinker with RealXTend as well as the OpenCog/RealXTend proxy.
Sokoban
Sokoban would be a good toy domain for experimenting with various OpenCog methods. So hooking up OpenCog to a simple Sokoban server would seem worthwhile. MOSES and PLN could be used for Sokoban on their own, but it's more interesting of course to take an integrative approach. (1)
Robotics
Extend OpenCog to communicate with physical robots, using a toolkit like PyRo, or the Player-Stage Framework, or something similar. As mechanisms for communicating with agents in virtual worlds will be provided, this is not a huge conceptual leap, but will doubtless lead to many practical complexities. (1-3 depending on how far you go)
OpenBiomind
- Language: Java
- Code: http://code.google.com/p/openbiomind
- List: http://groups.google.com/group/openbiomind (mentors: Lucio and Ben)
- License: GPLv2
OpenBiomind contains code for applying genetic programming to analyze gene expression microarray data and SNP data. This approach has been successfully used to learn diagnostic rules for cancer, Alzheimer's, Parkinson's and other diseases, as reflected in several publications.
Neurobiological data analysis
Add necessary data types preprocessors (e.g. MEG, fMRI, EEG, PET, etc.), analysis algorithm tweaks, documented methodologies and other facilities for analysis of neurobiological data. Many public neurobiological databases exist (falling under the umbrella of the Human Cognome Project), for example the Allen Brain Atlas and others.

