Software

Markov Chain Generator in .NET

Posted in Programming, Software on February 18th, 2010 by Noldorin – 2 Comments

As part of  my current IRC.NET project (an IRC client library for .NET 4.0), I decided to create as a sample project an IRC bot that implements a Markov text generator, one of the many applications of the Markov chain, a particularly concept in probability theory. I am going to assume here that you already know what a Markov chain is and have some idea of its potential applications.

Here is the relevant C# 3.0 source code from my sample project that contains all the functionality relating to Markov chains and Markov generation. A rather nieve implementation, I would freely admit, but a simple and effective one, I’d like to think.

MarkovChain class

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Diagnostics;
using System.Linq;
using System.Text;

namespace MarkovChainTextBox
{
    // Represents a Markov chain of arbitrary length.
    [DebuggerDisplay("{this.nodes.Count} nodes")]
    public class MarkovChain<T>
    {
        private static readonly IEqualityComparer<T> comparer = EqualityComparer<T>.Default;

        private readonly Random random = new Random();

        private List<MarkovChainNode<T>> nodes;
        private ReadOnlyCollection<MarkovChainNode<T>> nodesReadOnly;

        public MarkovChain()
        {
            this.nodes = new List<MarkovChainNode<T>>();
            this.nodesReadOnly = new ReadOnlyCollection<MarkovChainNode<T>>(this.nodes);
        }

        public ReadOnlyCollection<MarkovChainNode<T>> Nodes
        {
            get { return nodesReadOnly; }
        }

        public IEnumerable<T> GenerateSequence()
        {
            var curNode = GetNode(default(T));
            while (true)
            {
                if (curNode.Links.Count == 0)
                    break;
                curNode = curNode.Links[random.Next(curNode.Links.Count)];
                if (curNode.Value == null)
                    break;
                yield return curNode.Value;
            }
        }

        public void Train(T fromValue, T toValue)
        {
            var fromNode = GetNode(fromValue);
            var toNode = GetNode(toValue);
            fromNode.AddLink(toNode);
        }

        private MarkovChainNode<T> GetNode(T value)
        {
            var node = this.nodes.SingleOrDefault(n => comparer.Equals(n.Value, value));
            if (node == null)
            {
                node = new MarkovChainNode<T>(value);
                this.nodes.Add(node);
            }
            return node;
        }
    }
}

MarkovChainNode class

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Diagnostics;
using System.Linq;
using System.Text;

namespace MarkovChainTextBox
{
    // Represents a node within a Markov chain.
    [DebuggerDisplay("Value: {this.value == null ? \"(null)\" : this.value.ToString()}, {this.links.Count} links")]
    public class MarkovChainNode<T>
    {
        private T value;
        private List<MarkovChainNode<T>> links;
        private ReadOnlyCollection<MarkovChainNode<T>> linksReadOnly;

        public MarkovChainNode(T value)
            : this()
        {
            this.value = value;
        }

        public MarkovChainNode()
        {
            this.links = new List<MarkovChainNode<T>>();
            this.linksReadOnly = new ReadOnlyCollection<MarkovChainNode<T>>(this.links);
        }

        public T Value
        {
            get { return this.value; }
            set { this.value = value; }
        }

        public ReadOnlyCollection<MarkovChainNode<T>> Links
        {
            get { return linksReadOnly; }
        }

        public void AddLink(MarkovChainNode<T> toNode)
        {
            this.links.Add(toNode);
        }
    }
}

As usual, I am keen to hear any sort of feedback about what is a useful little piece of code. Undoubtedly, markov chains have a number of pretty interesting applications in science and the computer world, so it would be cool to hear if other people are using this for different purposes…

Electronic Lecture Notes Revisited

Posted in Maths & Science, Personal, Software on February 3rd, 2010 by Noldorin – 2 Comments

As I promised, I have finally gotten around to evaluating the effectiveness of taking lecture notes using an Eee PC (a first model version, borrowed from a friend). I say evaluate, but I am effectively referring to one thing: the annoyance of typing mathematics in LaTeX on a keyboard less wide than my handspan. Although as a physicists student, this plan has turned out not to be the most effective, I would be inclined to think that many of the problems would be eliminated if you were just taking notes for a history lecture or such (pretty much any humanity for that matter, including law, as I’ve been told). Regardless, reST is at least a pretty nice format in which to write, and using doctutils to pdf-ify the notes is still the way forward, I would say. So to summarise, the quite horrible ineffiency (and concentration required) of typing was the main reason that led me to abandon the (initially very hopeful) endeavour by two weeks into term. I can’t be sure how much one of the newer Eee PC models would have helpd me – somewhat, but not hugely, I would think.

As it is, I am fortunate enough that all my lecturers have decided to hand out printed (and generally pretty complete) notes for the subjects this term. Me, being the lazy student I am, am finding it a good excuse to not bother with any notes for the current term. Saying that, at least it’s letting me focus more on actually understanding the (painfully opaque) material of the current courses, which is surely a good thing!

Well, at the least, I hope I have give some thoughts to anyone else considering a similar plan for lecture notes. It would be actually be quite satisfying to hear if there are any humanities students out there that have adopted the reST/PDF approach. Equally so, I would  be rather surprised if any science guys have managed to make the thing work with an Eee PC.

WPF on IRC

Posted in Programming, Software on January 29th, 2010 by Noldorin – Be the first to comment

Being a long-time member of the ##wpf channel over on the Freenode IRC network, I thought I would bring it to the attention of any readers who are interested in this wonderful technology. Since the post is really only of interest to developers who already know WPF (or are at least keen to learn it), I shall not say anything about the technology except that it is a superb (and very modern) user interface/presentation framework for Windows applications – I strongly recommend it to anyone who develops complex (or even simple) user interfaces as a replacement for whatever library/toolkit they are already using.

The channel is currently the most popular WPF-related channel on any IRC network (of which I know), and sees regular activity, though not quite as much as we hope, hence this post! While the channel averages around 20-30 users at any time, it has not quite seen the growth it deserves as WPF gains more and more popularity. Hence, I urge anyone interested in the subject to at least pop in and check out the channel, perhaps idle around for a while. We’re a friendly lot, I promise, and will be glad to help out any newcomers!

Designs for a Computer Algebra System

Posted in Maths & Science, Programming, Projects, Software on January 8th, 2010 by Noldorin – Be the first to comment

Creating a modern computer algebra system from the ground up, as has been a plan of mine for some months now, is no trivial goal, as anyone who has even a vague conceptual understanding of computational algebra must surely know. My efforts in this area, grouped under the title of “the Syracuse Project“, have involved mostly research and large amounts of contemplation so far, but I feel that I have finally managed to formulate enough solid ideas that they are worth presenting in a short article. I was also fortunate enough to find someone on IRC (in #math-software on Freenode) with whom I could discuss and refine many of our mutual ideas. The ideas discussed in this post are the culmination of my own thoughts and much of the knowledge I gained from my conversations with Robert Smith (nickname ‘Quadrescence’) online, which has served as the basis for some of my own investigation and explanations here, in a modified form.

What I am going to focus on here falls mainly under the domain of my Euclid.NET project, which is essentially the core of Archimedes, which one might describe as the “CAS kernel”.  Think of Euclid.NET as the framework for symbolic mathematics upon which Archimedes is built. Its primary purpose is to handle such tasks as expression evaluation, simplification, differentiation, series expansion, and so on. (Bonus points if you can spot the connection between the names of Archimedes and Syracuse.)

TeX.NET, which is the primary parser (expression tree builder) for the project at this moment, has already seen a stable release, and will soon see the second with any luck. As you might guess from the name, it takes input in TeX syntax (well, actually LaTeX, with a few extensions).

To begin, I thought I would tackle one of the tougher aspects of computer algebra systems, namely, simplification. More trivial features such as expression evaluation and differentiation are essential to a CAS, yet are hardly worth an in-depth discussion, at least not for now.

So what is simplification?

Simplification, to state the banal, is concerned with making an expression more “simple”. Unlike many every terms that have been borrowed by mathematics, “simple” or “simplification” truly does not have a more rigorous definition. The problem here largely stems from the fact that the definition of what is simple is somewhat fluid – it depends to a great extent on the context. To illustrate this nature more concretely, here a few examples.

  1. (x+1)^2 or x^2 + 2x + 1?
  2. y^{-3.5} or 1/y^{3.5} or 1/(y^3 \sqrt{y})?
  3. \tan x \sin x or \dfrac{\sin^2 x}{\cos x}?
  4. e^{A+(2x-y)} or e^A\left[\dfrac{e^{2x}}{e^{y}}\right]?
  5. 4 \sqrt{x^3 + 6x^2 + 9x} or 4(x + 3)^2 \sqrt{x} ?

A simple expression treeA visualisation of a very simple expression tree. The equivalent infix expression is (2 + 2) + (2 + 2) + (3 + 3). Note that in general such trees are not restricted to be binary.

I think that anyone who has had sufficient experience using maths and manipulating many formulae and expressions will realise that in some scenarios, one of the given equivalent expressions in 1 to 5 is desired in a certain scenario, while another is desired in a second situation. Hence, any conceivable simplification algorithm cannot be treated as a rigid mechanical process, bur rather must adjust itself depending on the parameters it is given, which hint at the sort of result that is desired.

When humans perform simplification of mathematical expressions, they often use so-called intuition, developed from much prior experience, along with trial and error, to (in most cases) quickly and accurately simplify maths. It is inevitable that even the best mathematicians hit dead ends when trying to simplify complicated expressions. A computer, perhaps unsurprisingly, cannot do any better. Moreover, mathematical simplification is, in my view, one of the few aspects of mathematical methodology that overall better suits a holistic intelligence, rather than the traditional sequential one that is most often associated with maths (theorem proving being a notable exception). In fact, mathematics is not all so logical and all-encompassing as even mathematicians not long ago thought – thanks to the magnificent Incompleteness Theorem proposed by Kurt Godel in the 1930s – this is however a subject for another day.

How can we measure simplicity?

Fortunately, and perhaps surprisingly, the field of evolutionary computation presents a rather handy way of treating the “simplicity” of expressions, that is, a fitness function – in its most abstract sense, something that measures the absolute “fitness” of any given solution for a certain optimisation problem (most commonly genetic algorithms, which lent the term “fitness”). The “fitness” may be thought of qualitatively as the value, worth, or suitability of a particular solution. The solutions, in this case, are of course expression trees.

To begin, it is greatly helpful to reduce the problem by extracting a certain (small) number of parameters from the expression tree, rather than trying to analyse the entire thing holistically. This reduces the parameter space (the set of all possible parameters) dramatically, which is typically highly beneficial in optimisation problems. The fitness function itself is allowed to take any form in general, though we shall see shortly that one or two classes of function in particular are desirable. For now, let us just focus on the set of parameters. After some consideration, I came to the conclusion that any effective parameter space must consist of the following variables:

  • Size of the expression tree (i.e. the number of nodes)
  • Height of the expression tree  (i.e. the number of layers)
  • Width of the expression tree (i.e. the number of nodes in the bottom layer).

These are all of course integer values, and thus the value of the function is restricted to lie within the set of integers. After building up an image that measuring simplicity is a tricky thing, this may seem like a rather straightforward framework; indeed, it is in some ways, thought it is worth noting that the apparent problems arise when we decide what function to use and when it should be applied. The reasons for the choice of these parameters should become apparent soon.

Let us first consider a basic fitness function that simply weights each parameter individually in a linear combination. In other words, the fitness function, F(S, H, W) may be defined as the following, where S, H, and W correspond to the size, height, and width of the expression tree respectively.

F(S, H, W) = aS + bH + cW

The constants a, b, c may be any integer (postive or negative), and should be passed to the simplifier routine rather than predefined, according to the desired output. It should be quite evident that by choosing different magnitudes as well as signs for these constants, each of the three parameters may be independently rewarded or penalised to greater or lesser extents. The question you might then ask is: why not a more complicated function depending on S, H, and W? My answer is a straightforward one: there is no need. A linear combination of terms gives enough control over the desired function that extending to the function to add higher-order or even exponential terms would be quite pointless and arbitrary. I have, however, far from closed my mind on this matter – as I design and test the system progressively, certain discoveries may be made that suggest a slightly different approach.

Indeed, my only other real consideration for a fitness function thus far is one of even more basic form. Suppose, for example, that the fitness function only depended on two things: a) which parameter should be prioritised, b) whether this parameter should be promoted or demoted. The other two parameters would simply be minimised in the case that the first shows no preference between two trees. I have not finalised the implementation of this, but hopefully this brief description should give you an idea.

I now only leave, as an exercise to the reader, the five example expressions given in the previous section, and considered how any desired result (simplified form) can be achieved through the selection of the appropiate fitness function (using either of the two I have just proposed). Of course, feel free to post a comment regarding any queries or findings you have regarding this matter.

An effective algorithm for simplification

So far, I have discussed how simplicity can be measured in absolute terms, and how in this way the most “simple” of a set of solutions (expression trees) can be chosen as the result of the algorithm. What I have not really mentioned, however, is how an algorithm might actually search out the possible solutions. Although the nature of the algorithm is mainly independent of the fitness function  and the evaluation of expression trees, it is helpful to discuss this second so as to give a clearer image to the approach as a whole.

Simplification when done by a computer, as when done by humans, involves at its heart the application of a large number of mathematical rules that transform expressions. To give some examples of a few of the more basic rules:

  • x + 0 \rightarrow x
  • 1 * x \rightarrow x
  • x * x \rightarrow x^2
  • x * (y / x) \rightarrow y
  • x^2 - y^2 \rightarrow (x + y)(x - y)
  • \sin^2 x + \cos^2 x \rightarrow 1

Given a complete set of all simplification rules, we can find a path (or derivation) between any two equivalent expressions. (In theory, this is no issue, since the set is finite, though rather large. The practicality of  all the required rules is another issue that I will not go into here.) Note that the rules are bidirectional; they allow you take a simple expression and sequentially transform it into an arbitrarily complex, yet still identical one. (For example, x \rightarrow x + 0 - 0 + 0 - 0 + 0.)

Assuming (quite reasonably) that this assertion that a “finite number of application of simplification rules can derive any equivalent expression from the original” is valid, we must then consider how the search should proceed. Under the utterly naive brute force approach, the algorithm is clearly non-terminating, but we can do a lot better than this.

The search algorithm is all about compromisation in essence. If we search every possible derived expression (up to a certain size), then it could take an unreasonably length of time to simplify even relatively tiny expressions. On the other hand, we make too many assumptions and cut off many branches in the search tree prematurely, the algorithm may terminate quickly, though not necessarily with the simplest solution (or even anything close). Hence, the idea of using genetic algorithms, albeit initially appealing, is in my view to restrictive to a problem that requires a “perfect” answer in most cases, and should not have any stochastic nature.

My currently planned approach is one that does not differ greatly from a simple brute force evaluation of all the simplification paths. The main improvement is one that falls out rather naturally from using a fitness function. The idea is that each node of the search tree is evaluated by the fitness function upon its creation, and if the fitness is below a certain (specified) threshold, the search from that particular node terminates. For a start, this prevents originally small/simple expression trees being transformed into ones that are absurdly large, while still allowing some limited expansion of expression trees in the hope that they may later be simplified very effectively. Many nonsensical (what we might call counter-intuitive) paths of reduction of the expression would also be eliminated in a similar way. There is also one practical problem of important note here: many disparate simplification paths along the tree do converge to the same solution during a search (some quite quickly), so it would be quite foolish to branch twice from identical nodes of the search tree. Instead, we really want to cache any nodes (expression trees) already visited during the search process, and not compute their descendants (derived children) more than once. A simple hash table (set, in fact) would seem like the most effective way of accomplish this. Creating an efficient and relatively collision-free hash function for an arbitrary tree structure is however no trivial task. I was to get a number of quite sensible and useful responds when I asked the question on StackOverflow.

Apologies if this discussion of “search trees” and “expression trees” and their corresponding nodes has led to some confusion regarding what is what. It is most important to recognise that the node of a search tree is itself an expression tree (what I sometimes call a “solution” to the “simplification problem”). Due to the risk of losing reader interest at the cost of an even longer post, I shall stop my ramblings here, and leave further elaboration of the search algorithm for another post.

The future of the project

What has been discussed so far is largely theoretical, yet I have tried to present it in such a way that the method of implementation is for the most part self-evident. Work on this project will likely proceed slowly in the short-term future, though as it advances, the features will surely solidify. I am hoping that at least by summer there should be some tangible results to these efforts. Regardless, I shall try to give status updates along the way.

Leaner CSS

Posted in Programming, Software, Web Design on December 14th, 2009 by Noldorin – Be the first to comment

In my wanderings today, I just happened to stumble across a relatively new project by the name of LESS, which might appeal to anyone interested in web design. LESS has the simple of aim of making CSS “leaner” by extending the language with such constructs as variables, mixins, operations, and nested rules.

Now, I haven’t gotten around to trying it out yet, but it immediately strucky me as a pretty cool little system – it seems to vastly extend the usability of CSS in a very similar way to what JQuery did with Javascript (despite plain CSS being somewhat less horrible than plain Javascript). We’ll have to wait and see whether this project takes off however, though I have an inkling it just may – it has the innovation, for a start.

I will undoubtedly report back whenever I can find some free time to squeeze between my studying and ongoing computational projects. For the moment, I thought I would simply spread the word, as the creators seem most keen to promote.

Electronic Lecture Notes

Posted in Personal, Projects, Software on November 28th, 2009 by Noldorin – Be the first to comment

Lectures at college and the inseparable hours of studying have been taking up much of my time recently, and in this mindset have made me start thinking about how I could somehow improve that wonderfully tedious and hand-cramping task of taking lectures notes from the board. Ok, so taking lecture notes by hand is far from the biggest inconvenience any student experiences on a regular basis; nonetheless, the idea came upon me that if I could create and store all these notes electronically, I’d be saving myself a good deal of pain (presently and during exam season).

The main challenge of this plan was finding some way to make the input/conversion process for turning what’s written on the blackboard ultimately into some prettily formatted pages, quickly and trivially. Although I’m sure it’s been done by many before, using MS Word or the like immediately struck me as impractical – indeed, worse than pen and paper, in my mind. Hence, my initial thoughts were centered on using a lightweight markup of some sort, such as YAML, Markdown, or perhaps simply plain text. (The text could then be rendered nicely by a program at some later stage.) My plans were to write a small editor application specifically for some format, with facilities such as auto-completion and an in-built previewer to make the note-taking even more efficient.

In the end, my mind was decided when my friend David suggested that I simply use the widespread wiki markup language reST as the primary format. This turns out to be almost perfect, since the Python docutils package contains an advanced and stable reST-to-LaTeX converter – and what could be prettier than LaTeX for displaying notes full of complex formulae. He also kindly offerred to lend me his Eee PC. I daresay this seems like an ideal combination…

Now, I must confess that I have not in fact yet begun to adopt this wonderful approach of taking notes, despite my being quite eager to start. Given that there are only about two weeks remaining before the current courses finish and Christmas holidays begin, I thought that I’d just set up my environment and at best give the plan a “test run”. Beginning in January, however, I very much intend to be using nothing but the Eee PC for notes. I will of course duly report back how the matter turns out, and with any luck, begin a weekly routine of uploading lovely LaTeX PDFs of (not so lovely) physics material!

RAKIS – A Semantic Information System

Posted in Maths & Science, Software on October 14th, 2009 by Noldorin – Be the first to comment

Over the past few weeks my interest in conceptual information systems has very much been rekindled. This is largely thanks to reading and talking about a certain paper by a developer friend of mine, which discusses the ideas and implementation for his MrMoe framework. Although the project has only recently been announced, and the paper won’t be released publicly for a bit of time yet, many of the ideas are already quite well-formed, and along with my ongoing curiosity relating to the Semantic Web and RDF, have stimulated me to write something of my own thoughts on the topic. (A couple of overly-ambitious projects involving RDF are likely what caused me to put aside the subject for a while.)

Discussing RDF or MrMoe is not the purpose of this post (I’m sure the author, Jan, will however do so further in the near future), so I shall leave these aside for the moment. I will however (somewhat boldly) state that the system I am proposing is significantly more abstract and, I hope, more powerful than either, at least in theory. Saying that, I think all these systems had slightly different applications in mind when designed. Enough preamble though – I don’t wish to create any (more) premature hype. Let me now introduce you to my incipient system.

RAKIS – or the RAKIS Abstract Knowledge and Information System – is my name for the model and associated theory I have invented in the vision of representing information in what I foresee to be a most abstract, context-free, and yet highly structured manner. Do note the use of the word “information” here; this system is not designed simply to represent data, but moreover meaning. It is what one might term a “semantic model”, though this falls short by a long margin from giving any true impression of its potency and overall utility.

My aim in this article is to begin with a very simple model and build it up in small steps, eventually reaching the full RAKIS model. I am hoping that this incrimental approach will allow readers to gradually build up understanding of the concepts and the “flavour” of the system, and to appreciate its great flexibility and potential by the end of the article. Although many with a higher-level education in mathematics will observe that much of the model can be described in terms of graph theory, I will try to stay clear of its formalities and jargon (or at least not rely on them).

Let us first construct a set of nodes that represent our basic concepts. These starting concepts might loosely be described as the “nouns” of the model. For want of a good simple example, I will use the Bach family (as you will see later in this and coming posts, this will serve well because of its many associations). Do not let the simplicity of the example fool you – subsequent ones will give you a better understanding of the capability of RAKIS.

The nodes in this diagram represent various members of the Bach family.

The nodes in this diagram represent various members of the Bach family.

Now, there’s nothing very interesting about this diagram (or graph, to use the technical term), though there is one crucial subtlety to point out at this stage. The labels attached to each of the nodes are not truly part of the model, and must only be considered as aids to the viewer. (To throw in some jargon, this graph may be termed an unlabelled graph, despite appearing to be otherwise.) Visualise the same diagram without any text, and you might ask; what would such a model without labels possibly mean? How could it even vaguely represent any form of useful information? To be frank, it does not, as it is. The graph certainly represents data, but has little (if any) inherent meaning. You will see shortly that the meaning is introduced naturally, rather than being directly imposed, as is almost ubiquitous in traditional systems that organise data/information, and even more modern ones such as RDF.

The following diagram incorporates relationships (what I term links)  between the various nodes (or concepts) of the model. The way I have chosen these relationships is both somewhat arbitrary but also quite deliberate. It should be evident from the diagram that these links simply represent parent-child and husband-wife relationships, in the direction of child to parent. (Note that J. S. Bach was married twice during his life, and had a number of musician children by each wife.)

The nodes represent members of the Bach family, and the arrows the relationships between them. The direction chosen for the arrows is arbitrary, but must be semantically consistent.

The nodes represent members of the Bach family, and the arrows the relationships between them. The direction chosen for the arrows is arbitrary, but must be semantically consistent.

Now you might be beginning to see the inklings of meaning in this model, given of course the pre-defined knowledge that it relates to the most famous members of the Bach family. Still, it gives a strong idea as to the structure of this certain family.

Our next step is what truly starts to make this system interesting. At the moment, the model may simply be described as a graph with the edges representing the links between concepts. By treating the set of edges as the nodes for a higher-level graph, we can extend the model to represents “relationships of relationships”. Note that other nodes (or concepts) can be introduced at this second level that do not correspond to links of the first level, such as the “Family Relationship” node in the following diagram.

The nodes represent members of the Bach family, the solid arrows the relationships between them, and the dashed arrows the relationships of relationships, or "meta-relationships" (note that they are still directed).

The nodes represent members of the Bach family, the solid arrows the relationships between them, and the dashed arrows the relationships of relationships, or "meta-relationships" (note that they are still directed).

Where the real power of this model stems is the recursive nature of links (relationships). One need not stop at meta-links, but rather one can proceed to create higher and higher levels which progressively represent more and more abstract information. Nor must (meta-)links necessarily join concepts in adjacent levels, but can rather link any concepts, provided that the arrow is in the direction of the higher-level concept. Unsurprisingly, the base level simply represents data, or in other words, the most concrete concepts within the system. Although it is not necessarily apparent here, the highest levels should theoretically be capable of describing very abstract ideas, making the system much more effective at relating seemingly disparate pieces of information, and possibly even performing some forms of logical inference and reasoning.  I hope this will be demonstrated explicitly in my next post on RAKIS.

The final point to stress is the implicit manner in which this model represents information. Given the appropiate context or “hints”, one can deduce easily enough what the graph represents (the significance of the concepts and relationships) without need for any labels on the nodes. Clearly, if you were presented with the above diagram, it would be virtually impossible to figure out that it corresponds to the Bach family (and hard enough to realise it corresponds to a family, even). However, if the graph were to be extended (it is not designed to exist in its isolated form) so that it interfaces with something outside of the system, the set of possible interpretations quickly diminishes until it is quite obvious what the model represents.

To give a fairly blatant example an interface, each concept in the graph shown above could be linked to two nodes, one from a set of forenames, the second from a set of surnames. These sets of names would exist external to the system, and would simply map to what you might call “interface nodes”. The advantage of utilising this design is that the model still retains its implicit meaning (no need of labels), when put into an appropriate context. In addition, this context-free nature of the model allows arbitrary (albeit relevant) isomorphisms, which only serves to increase the power of the model to associate different concepts on the surface at a deeper level of abstraction. This is highly important if your goal is to perform any sort of formal reasoning using the system, which is much of the aim of RAKIS.

An intriguing extension to the proposed model, on which I will just briefly touch now. So far I have treated all graphs (in all layers) as unweighted, meaning that an edge (link) either exists or does not. In a weighted graph, each edge would be assigned a real value (typically within a restricted range). By applying this feature to the RAKIS model, I would hope to open the doors to the possibilities of performing advanced types of inference, and perhaps even machine learning. (The latter would involve “growing” new concepts and increasing the strengths (weights) of links when certain criteria are satisfied. Although I do not want to suggest too much similarity, there is some analogy to artificial neural networks here. Not to worry if this paragraph little sense, however, since I will be elucidating this aspect of the system in a future text.

By now I’ve probably bombarded you as readers with enough of my theory (for the time being), so let us just summarise the key points I have already introduced:

  • The fundamental features of the RAKIS model are concepts and links, which can be thought of as nodes (vertices) and relationships (edges) respectively.
  • Concepts are unlabelled, in that their meaning is implicit and can only be inferred from context. Some sort of interface, bridging the system and the external world, is typically required to instill some form of meaning.
  • Links represent connections between concepts, such as a parent-child relationship, as given above. They are directed arbitrarily, but must be consistent with each other.
  • Both concepts and links can be extended to higher levels so that they represent “meta-concepts”, “meta-links”, “meta-meta-links”, and so on.
  • Base-level concepts represent fairly concrete ideas, while higher-level ones (“meta-concepts”) represent progressively more abstract ideas.
  • Due to the highly fluid and context-free nature of the model, isomorphisms (parallels of sorts) can be recognised between various subsets of concepts relatively easily. The implication of this is that meaning is not fixed, but is rather context-dependant.

Note: Italic terms present in this article are typically used to highlight elements of the terminology I have chosen for RAKIS.

That concludes the first post of what I intend will be a series on the RAKIS system. Just a teaser for my next post: it will be demonstrated how various logical and mathematical ideas, such as sets and prime numbers, can be expressed formally by RAKIS. Well, I hope that this introduction has at least given readers a decent grasp of the power as well as the design of the core model. If you feel anything needs clarification, or simply have general comments about this post, feel free to leave a comment.

TeX.NET 0.1.0 released

Posted in Maths & Science, Programming, Projects, Software on October 8th, 2009 by Noldorin – Be the first to comment

Some news to report at last! I’ve just released the first the first beta version of my TeX.NET project over at Launchpad. Both the source (with unit tests) and binaries of what might be called “a parsing and writing library for mathematical expressions written in the TeX format”; programmed, of course, in C# 3.0 (with some rudimentary F# bindings).

This project hasn’t exactly been the focus of my blog posts recently (or at all, for the matter), mainly because I’ve found that working on the thing has taken almost all of my free time over the past month or so.  As I have come to learn, putting effort into promotion of ones ideas and work is not a wise thing to do before they have proper substance!

Since I’m not one to reword any text that I’ve already (quite carefully) written, I shall quote the project summary over at Launchpad:

TeX.NET is a library for the .NET Framework 3.5 that provides lexical analysis and parsing of mathematics written in the TeX format into expression trees, and conversely the writing of expression trees in the same textual format. The purpose of the library is to provide a useful framework for inputing, outputing, and manipulating complex mathematical expressions.

Now, you may of course be curious where I’m taking this project in the grand scheme of things. (Indeed, a library for parsing of mathematical expressions isn’t terribly useful on its own.) Although it’s far from an in-depth explanation of the end goals and my approach, I suggest you check out the Syracuse Project, which is the “container” or “super-project” for TeX.NET, among other related libraries and programs. I can however say that I will be shifting my efforts more towards the Euclid.NET project, now that I have a stable initial version of the requisite parser/writer library. It is bound to be the subject of one of my upcoming posts, anyway.

So, if anyone is inclined to give my TeX.NET library a whirl (or better yet, some further rigorous testing), I would urge gratefully them to do so. Any sort of feedback on this project is welcome, as with all of my endeavours. My aim the moment is mainly to garner a nucleus of interest and see this library applied to great things!

Demo of Euclid.NET Launched

Posted in Programming, Projects, Software on September 20th, 2009 by Noldorin – Be the first to comment

Just a quick note that I’ve recently put up an interactive demo of my Euclid.NET project (symbolic mathematical framework for .NET).  This project is based off my TeX.NET parser, which is currently all of which the demo presently consists. I hope to both expand and document the support for the parser soon, as well as starting some serious work on Euclid.NET specifically (most likely, a symbolic differentiator to begin).

WordPress Permalinks

Posted in Software on August 21st, 2009 by Noldorin – Be the first to comment

In the process of setting of setting up my new blog, I quickly realised that by default WordPress was displaying posts and pages using query strings in the URLs, quite in contrast with the pretty URLs I was expecting from my experience hosting on WordPress.com.

To give example, in contrast to

/blog/?p=123

which was the form being shown, I was desiring something more like

/01/10/entry-title-here

Not such a terribly important matter in the scale of things; yet aesthetics is usually a worthwhile pursuit, so why not with URLs too? Moreover, the URLs then become hackable in this form.

Like many other features of the WordPress system, pretty permalinks are not enabled as default, but can be configured with a bit of effort. In this case, it required more than a bit of effort…

The problem here is caused primarily by the fact that WordPress is designed to run best on an Apache server, and not an IIS (Windows) server. In fact, pretty permalinks generally work using a .htaccess file that uses the mod_rewrite engine. Indeed, this is the file WordPress generates by default. Saying this, it is not much more difficult to confiugre IIs 7 to do the URL rewriting – the WordPress Codex page on Using Permalinks does in fact detail exactly the code required in web.config. IIS 6 is another story, unfortunately. As far as I know, many shared web servers provide the ISAPI Rewrite tool, as does my current one. Version 3 supports Apache-style .htaccess files; but alas, version 2 is all that I have available, and is probably the more widespread one at present.

After a fair bit of messing around with the httpd.ini file in the root directory of my website, I finally managed to replicate the URL rewriting functionality otherwise available on Apache servers. It’s not quite as elegant as the standard .htaccess method, but it seems to be both short and efficient. A word of warning: it has only been tested with the latest version of WordPress (2.8.4), so I cannot guarantee complete success on other versions.

[ISAPI_Rewrite]

UriMatchPrefix /blog/
RewriteCond URL (?!wp-.*).* [O]
RewriteCond URL (?!license.txt$).* [O]
RewriteCond URL (?!xmlrpc.php$).* [O]
RewriteRule .* /blog/index.php [L]

Simply past the above text into a file named httpd.ini, copy it into the root directory, and you have your pretty permalinks (providing, of course, that you have set the option in Settings > Permalinks of the admin interface.)