Archive for October, 2009

RAKIS – A Semantic Information System

Posted in Maths & Science, Software on October 14th, 2009 by Noldorin – Be the first to comment

Over the past few weeks my interest in conceptual information systems has very much been rekindled. This is largely thanks to reading and talking about a certain paper by a developer friend of mine, which discusses the ideas and implementation for his MrMoe framework. Although the project has only recently been announced, and the paper won’t be released publicly for a bit of time yet, many of the ideas are already quite well-formed, and along with my ongoing curiosity relating to the Semantic Web and RDF, have stimulated me to write something of my own thoughts on the topic. (A couple of overly-ambitious projects involving RDF are likely what caused me to put aside the subject for a while.)

Discussing RDF or MrMoe is not the purpose of this post (I’m sure the author, Jan, will however do so further in the near future), so I shall leave these aside for the moment. I will however (somewhat boldly) state that the system I am proposing is significantly more abstract and, I hope, more powerful than either, at least in theory. Saying that, I think all these systems had slightly different applications in mind when designed. Enough preamble though – I don’t wish to create any (more) premature hype. Let me now introduce you to my incipient system.

RAKIS – or the RAKIS Abstract Knowledge and Information System – is my name for the model and associated theory I have invented in the vision of representing information in what I foresee to be a most abstract, context-free, and yet highly structured manner. Do note the use of the word “information” here; this system is not designed simply to represent data, but moreover meaning. It is what one might term a “semantic model”, though this falls short by a long margin from giving any true impression of its potency and overall utility.

My aim in this article is to begin with a very simple model and build it up in small steps, eventually reaching the full RAKIS model. I am hoping that this incrimental approach will allow readers to gradually build up understanding of the concepts and the “flavour” of the system, and to appreciate its great flexibility and potential by the end of the article. Although many with a higher-level education in mathematics will observe that much of the model can be described in terms of graph theory, I will try to stay clear of its formalities and jargon (or at least not rely on them).

Let us first construct a set of nodes that represent our basic concepts. These starting concepts might loosely be described as the “nouns” of the model. For want of a good simple example, I will use the Bach family (as you will see later in this and coming posts, this will serve well because of its many associations). Do not let the simplicity of the example fool you – subsequent ones will give you a better understanding of the capability of RAKIS.

The nodes in this diagram represent various members of the Bach family.

The nodes in this diagram represent various members of the Bach family.

Now, there’s nothing very interesting about this diagram (or graph, to use the technical term), though there is one crucial subtlety to point out at this stage. The labels attached to each of the nodes are not truly part of the model, and must only be considered as aids to the viewer. (To throw in some jargon, this graph may be termed an unlabelled graph, despite appearing to be otherwise.) Visualise the same diagram without any text, and you might ask; what would such a model without labels possibly mean? How could it even vaguely represent any form of useful information? To be frank, it does not, as it is. The graph certainly represents data, but has little (if any) inherent meaning. You will see shortly that the meaning is introduced naturally, rather than being directly imposed, as is almost ubiquitous in traditional systems that organise data/information, and even more modern ones such as RDF.

The following diagram incorporates relationships (what I term links)  between the various nodes (or concepts) of the model. The way I have chosen these relationships is both somewhat arbitrary but also quite deliberate. It should be evident from the diagram that these links simply represent parent-child and husband-wife relationships, in the direction of child to parent. (Note that J. S. Bach was married twice during his life, and had a number of musician children by each wife.)

The nodes represent members of the Bach family, and the arrows the relationships between them. The direction chosen for the arrows is arbitrary, but must be semantically consistent.

The nodes represent members of the Bach family, and the arrows the relationships between them. The direction chosen for the arrows is arbitrary, but must be semantically consistent.

Now you might be beginning to see the inklings of meaning in this model, given of course the pre-defined knowledge that it relates to the most famous members of the Bach family. Still, it gives a strong idea as to the structure of this certain family.

Our next step is what truly starts to make this system interesting. At the moment, the model may simply be described as a graph with the edges representing the links between concepts. By treating the set of edges as the nodes for a higher-level graph, we can extend the model to represents “relationships of relationships”. Note that other nodes (or concepts) can be introduced at this second level that do not correspond to links of the first level, such as the “Family Relationship” node in the following diagram.

The nodes represent members of the Bach family, the solid arrows the relationships between them, and the dashed arrows the relationships of relationships, or "meta-relationships" (note that they are still directed).

The nodes represent members of the Bach family, the solid arrows the relationships between them, and the dashed arrows the relationships of relationships, or "meta-relationships" (note that they are still directed).

Where the real power of this model stems is the recursive nature of links (relationships). One need not stop at meta-links, but rather one can proceed to create higher and higher levels which progressively represent more and more abstract information. Nor must (meta-)links necessarily join concepts in adjacent levels, but can rather link any concepts, provided that the arrow is in the direction of the higher-level concept. Unsurprisingly, the base level simply represents data, or in other words, the most concrete concepts within the system. Although it is not necessarily apparent here, the highest levels should theoretically be capable of describing very abstract ideas, making the system much more effective at relating seemingly disparate pieces of information, and possibly even performing some forms of logical inference and reasoning.  I hope this will be demonstrated explicitly in my next post on RAKIS.

The final point to stress is the implicit manner in which this model represents information. Given the appropiate context or “hints”, one can deduce easily enough what the graph represents (the significance of the concepts and relationships) without need for any labels on the nodes. Clearly, if you were presented with the above diagram, it would be virtually impossible to figure out that it corresponds to the Bach family (and hard enough to realise it corresponds to a family, even). However, if the graph were to be extended (it is not designed to exist in its isolated form) so that it interfaces with something outside of the system, the set of possible interpretations quickly diminishes until it is quite obvious what the model represents.

To give a fairly blatant example an interface, each concept in the graph shown above could be linked to two nodes, one from a set of forenames, the second from a set of surnames. These sets of names would exist external to the system, and would simply map to what you might call “interface nodes”. The advantage of utilising this design is that the model still retains its implicit meaning (no need of labels), when put into an appropriate context. In addition, this context-free nature of the model allows arbitrary (albeit relevant) isomorphisms, which only serves to increase the power of the model to associate different concepts on the surface at a deeper level of abstraction. This is highly important if your goal is to perform any sort of formal reasoning using the system, which is much of the aim of RAKIS.

An intriguing extension to the proposed model, on which I will just briefly touch now. So far I have treated all graphs (in all layers) as unweighted, meaning that an edge (link) either exists or does not. In a weighted graph, each edge would be assigned a real value (typically within a restricted range). By applying this feature to the RAKIS model, I would hope to open the doors to the possibilities of performing advanced types of inference, and perhaps even machine learning. (The latter would involve “growing” new concepts and increasing the strengths (weights) of links when certain criteria are satisfied. Although I do not want to suggest too much similarity, there is some analogy to artificial neural networks here. Not to worry if this paragraph little sense, however, since I will be elucidating this aspect of the system in a future text.

By now I’ve probably bombarded you as readers with enough of my theory (for the time being), so let us just summarise the key points I have already introduced:

  • The fundamental features of the RAKIS model are concepts and links, which can be thought of as nodes (vertices) and relationships (edges) respectively.
  • Concepts are unlabelled, in that their meaning is implicit and can only be inferred from context. Some sort of interface, bridging the system and the external world, is typically required to instill some form of meaning.
  • Links represent connections between concepts, such as a parent-child relationship, as given above. They are directed arbitrarily, but must be consistent with each other.
  • Both concepts and links can be extended to higher levels so that they represent “meta-concepts”, “meta-links”, “meta-meta-links”, and so on.
  • Base-level concepts represent fairly concrete ideas, while higher-level ones (“meta-concepts”) represent progressively more abstract ideas.
  • Due to the highly fluid and context-free nature of the model, isomorphisms (parallels of sorts) can be recognised between various subsets of concepts relatively easily. The implication of this is that meaning is not fixed, but is rather context-dependant.

Note: Italic terms present in this article are typically used to highlight elements of the terminology I have chosen for RAKIS.

That concludes the first post of what I intend will be a series on the RAKIS system. Just a teaser for my next post: it will be demonstrated how various logical and mathematical ideas, such as sets and prime numbers, can be expressed formally by RAKIS. Well, I hope that this introduction has at least given readers a decent grasp of the power as well as the design of the core model. If you feel anything needs clarification, or simply have general comments about this post, feel free to leave a comment.

TeX.NET 0.1.0 released

Posted in Maths & Science, Programming, Projects, Software on October 8th, 2009 by Noldorin – Be the first to comment

Some news to report at last! I’ve just released the first the first beta version of my TeX.NET project over at Launchpad. Both the source (with unit tests) and binaries of what might be called “a parsing and writing library for mathematical expressions written in the TeX format”; programmed, of course, in C# 3.0 (with some rudimentary F# bindings).

This project hasn’t exactly been the focus of my blog posts recently (or at all, for the matter), mainly because I’ve found that working on the thing has taken almost all of my free time over the past month or so.  As I have come to learn, putting effort into promotion of ones ideas and work is not a wise thing to do before they have proper substance!

Since I’m not one to reword any text that I’ve already (quite carefully) written, I shall quote the project summary over at Launchpad:

TeX.NET is a library for the .NET Framework 3.5 that provides lexical analysis and parsing of mathematics written in the TeX format into expression trees, and conversely the writing of expression trees in the same textual format. The purpose of the library is to provide a useful framework for inputing, outputing, and manipulating complex mathematical expressions.

Now, you may of course be curious where I’m taking this project in the grand scheme of things. (Indeed, a library for parsing of mathematical expressions isn’t terribly useful on its own.) Although it’s far from an in-depth explanation of the end goals and my approach, I suggest you check out the Syracuse Project, which is the “container” or “super-project” for TeX.NET, among other related libraries and programs. I can however say that I will be shifting my efforts more towards the Euclid.NET project, now that I have a stable initial version of the requisite parser/writer library. It is bound to be the subject of one of my upcoming posts, anyway.

So, if anyone is inclined to give my TeX.NET library a whirl (or better yet, some further rigorous testing), I would urge gratefully them to do so. Any sort of feedback on this project is welcome, as with all of my endeavours. My aim the moment is mainly to garner a nucleus of interest and see this library applied to great things!