[om-list] Graph Traversal Specifications

Thu Jan 22 09:20:01 EST 2004

Mark

    Sorry for the late response.  Very interesting stuff.  I am still digesting it.

tomp

~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Víðar sum quem nihil obstat.
Omnia apud me Mathesis fiunt.
www.Ontolog.Com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  ----- Original Message ----- 
  From: Mark Butler 
  To: One Model List 
  Sent: Wednesday, 03 December, 2003 20:44
  Subject: [om-list] Graph Traversal Specifications

  Lately I have been working on a mini graph traversal language that specifies a set of related objects given a starting object and a traversal specification given in terms of the language.  I need this to do simple things like access direct attributes, but it turns out to be easily extended to more sophisticated relations.

  My current design is based on items and connections between pairs of items.  Items are used to represent classes, objects, and standard relationships between objects and classes. Connections are used to connect object items to relation items, tagging each object as a subject, object, context, etc, much like the parts of a sentence.  

  The system is largely typeless, in that every item / object has a common format.  Attributes are accessed through algorithmic traversals from the attributed item.  More radically, object attributes are not explicitly represented in the system at all.  Rather, class attributes are represented as functions (graph traversal specifications) that return zero or more related objects given a source object.

  A typical attribute function might be children(x), a function that given x returns the set of the children of x.  There is a lot of freedom in how one might represent family relationships in relational form, and the graph traversal specification must match the chosen representation.

  Given a beget(father,mother,child) relation, to find the children of x, I have to enumerate the 'beget' predicate relationships where x is one of the first two arguments, and then enumerate the referents of the third argument.  The current design can easily represent shared / compressed relationships like beget(dad,mom,{child1,child2,....childN}) although that form must be split if each sub-relationship has distinct attributes, such as the birth date for each child.

  In any case, children(x) can be easily represented as a string of graph traversal and node test instructions, for example: 

       start  x       ; start on node x
       move  -arg1    ; move to relationship(s) where x is first argument
       test   beget   ; test to see if we are on a 'beget' relationship
       move   arg3    ; move to referent of argument 3
       return         ; current node is child of x

  The first thing to recognize is that for this to work, any failed test or traversal must be aborted, for example if x never participated in any relationships as argument 1, or none of those were 'beget'-type relationships.  The second thing to recognize is that each move instruction can result in an arbitrary number of graph traversals, each of which must be handled independently.  It is as if each move instruction forks a sub-process to handle each matching graph edge, and aggregates the results.

  Of course I can't create new processes or threads to handle this type of parallelism, so I just save state on some sort of stack instead.  Currently every move instruction gets converted into a related item index range scan on the set of connections to related items.  The indexes are currently by (arg_selector,item_class) so a move,test pair converts into a single range scan. Then for each matching item of the first range scan (a 'beget' relationship item in this case), a second related item index range scan is run, returning items that are connected as argument 3 - the children.

  In relational databases, this type of query execution plan is known as 'nested loops' and is common eay to execute a join of two well indexed tables.  The big difference is that relational databases join using global high cardinality indexes, whereas here every object has its own low cardinality index of connections to related objects.  (Of course relational databases could be implemented this way in principle, but no one ever has as far as I know - current technology is heavily entrenched - what happens in practice is the object oriented database people develop relational (i.e. SQL) interfaces to non-relational data, using 'relational' in the conventional strongly-typed set of rows sense, of course).

  Now for my purposes, I need this query to execute in microseconds, not milliseconds (as from a client server database), so I am loading everything into memory.  The advantage of strongly typed languages (C++,Java) is that single valued attribute access occurs at a fixed offset from from the item header in a matter of nanoseconds.  Multi-valued access is much slower, of course.  By using the same system for single and multivalued attributes, there is a significant performance penalty (~20x) similar to that of systems like CLOS (probably worse).  The advantage is flexibility, which is ideal for my applications and for general purpose knowledge modeling systems.

  One of my problems is that there is an increasing mismatch between C++ and my high level application code.  The low level stuff (query engine and the like) works very well in C++, but the high level stuff practically needs its own language.  I like C# style syntax, but the big difference is the requirement to create classes at runtime and execute code written in terms of those classes, all on top of the indexed item-connection meta framework.  I am hoping to write a interpreter roughly similar to what you might see in Python (first pass conversion to AST or byte code, interpret AST or byte code directly).    This is all high level code, so the 10x interpretation penalty won't be a problem.

   - Mark

------------------------------------------------------------------------------

  _______________________________________________
  om-list mailing list
  om-list at onemodel.org
  http://six.pairlist.net/mailman/listinfo/om-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://six.pairlist.net/pipermail/om-list/attachments/20040122/67b443e9/attachment.html