Check it out — thanks to Shujing Ke, we now have a “frequent and surprising pattern miner” for OpenCog, which runs either on a single machine or distributed across multiple machines!
This is a “greedy algorithm” for exhaustively finding patterns in an Atomspace. In the big picture, it’s intended to complement smarter but even-slower non-greedy algorithms.
This obsoletes “Fishgram” which was the previous attempt at OpenCog pattern mining, but was much less scalable and (unlike the current pattern miner) did not use the Atomspace for its internal, incremental pattern-store.
A tutorial with examples is here:
http://wiki.opencog.org/wikihome/index.php/Pattern_Miner
and the code is here
https://github.com/opencog/opencog/tree/master/opencog/learning/PatternMiner
We had to make up some new math for measuring surprisingness
http://wiki.opencog.org/wikihome/index.php/Measuring_Surprisingness
(though the stuff about “coherence” is not in the current code). In practice these new measures seem to work better than using the (closely related, but different) interaction information…
In order to make performance even moderately acceptable, it’s necessary to add a bunch of “filters” to rule out patterns that seem very unlikely to be interesting. To use this as a narrow-AI pattern miner for specific domains, one will probably need to create domain-specific filters. In the current code version this is done in the C++ code. An interesting potential future improvement would be to make the filters be Atoms, and let the filtering be done using the URE OpenCog rule engine. I suppose this would be needed for pattern mining in a real AGI context….
Anyway this is an exciting and major addition to the arsenal of OpenCog tools and I hope others will now take the opportunity to improve it and apply it! Many thanks are due to Shujing; this was a lot of work over a long period of time….
Currently the documentation consists of code docs plus the Wiki page linked above. Scientific publications describing the underlying theory and some particular applications will be forthcoming “soon”….
It utterly blows me away and suddenly has opened my mind in a significant way to realise that I, after learning C, C++ and Python for little under 6 weeks, understand (at least) the most primitive symbols and structures in the above code, and in that sense, there is nothing included within in it that I might undrestand.
I love computer science!
*might NOT understand. do’h.