[om-list] Fast subset enumeration in object-relation models

Thu Oct 9 15:05:55 EDT 2003

Fast subset enumeration in object-relation models

A couple of years ago we extensively discussed object-relation models.  
Object relation models have been around for a long time. Since 
Aristotle, the distinction between and object and a relation has 
arguably become the dominant concept in analytical philosophy, a 
leitmotif of Western thought.

Of course, logicians since then have discovered that the world is 
slightly more complex than Aristotle thought it was, but nonetheless, 
with proper adjustments, object-relation models can be very powerful 
tools. Thomas Packer's Mathesis is a fine example, a synthesis of 
Cartesian science and Aristotelian logic - correcting weaknesses in the 
latter by modeling objects as complex geometric entities with complex 
vector relationships - rather than simple points with simple 
relationships as in Greek logic.

Outside of scientific computing, however, the world largely continues to 
model everything roughly the same way Aristotle did, making only 
practical concessions for special purpose applications  - in particular 
fixing the list of allowed attributes and relationships for any given class.

Although there are other aspects of knowledge modelling that may have an 
impact on choice of data structures, fully polymorphic object relation 
models - those that use a class hierarchly with only one or two roots 
(Object and/or Relation, respectively) have a serious performance 
issue.   Traditionally if you want fast access to an attribute, you 
store it inline.  If you move attributes out into separate objects, you 
typically have to traverse a list for every access.  This is a bit of a 
problem, one of the reasons why LISP hasn't taken over the world yet, 
for example.

For many purposes, the list traversal penalty is acceptable as long as 
the list rarely exceeds ~100 items.  When a list gets into the thousands 
of entries, more sophisticated data structures need to be adopted. A 
very common case is where a list is can old items that are members of a 
class or any of its sub-classes, and you want to quickly enumerate only 
the ones belonging to a particular sub-class.

A simple way to represent a set is to use a doubly linked list, one 
entry for each set member.  This is insufficient to allow fast access by 
sub-class, so one alternative might by to store a set as partitioned set 
of sets, one for each sub class.  Then when we want to scan the members 
of the set that are of a certain sub-class, we could scan the top level 
list looking for the subset that contains that sub class, and then scan 
that subset.  Properly speaking, you would need a forest of subsets 
rather than a simple lists, but either way massive (10-1000x) 
performance improvements can be obtained by reducing a list search from 
O(n) to O(k) where n is the list size, and k is the number of elements 
of the target sub class.

A typical example is dealing with an 'object' that represents a 
universal, or a class.  A class typically has only one or two base 
classes, a half dozen attributes, but perhaps millions of instances.  
Any system that needs fast access to base classes and class attributes 
is going to fail miserably if the instance relations have to be scanned  
to find the more fundamental  relationships.  I believe what I have 
proposed will solve the problem nicely and have started an 
implementation in C++ that I am going to use in my family history 
system.  The trick of course is to encapsulate it all nicely, so that 
the ugly details are not apparent in higher level code.

Comments?

  - Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://six.pairlist.net/pipermail/om-list/attachments/20031009/6ae6579d/attachment.html