[om-list] Fast subset enumeration in object-relation models
Mark Butler
butlerm at middle.net
Thu Oct 9 15:05:55 EDT 2003
Fast subset enumeration in object-relation models
A couple of years ago we extensively discussed object-relation models.
Object relation models have been around for a long time. Since
Aristotle, the distinction between and object and a relation has
arguably become the dominant concept in analytical philosophy, a
leitmotif of Western thought.
Of course, logicians since then have discovered that the world is
slightly more complex than Aristotle thought it was, but nonetheless,
with proper adjustments, object-relation models can be very powerful
tools. Thomas Packer's Mathesis is a fine example, a synthesis of
Cartesian science and Aristotelian logic - correcting weaknesses in the
latter by modeling objects as complex geometric entities with complex
vector relationships - rather than simple points with simple
relationships as in Greek logic.
Outside of scientific computing, however, the world largely continues to
model everything roughly the same way Aristotle did, making only
practical concessions for special purpose applications - in particular
fixing the list of allowed attributes and relationships for any given class.
Although there are other aspects of knowledge modelling that may have an
impact on choice of data structures, fully polymorphic object relation
models - those that use a class hierarchly with only one or two roots
(Object and/or Relation, respectively) have a serious performance
issue. Traditionally if you want fast access to an attribute, you
store it inline. If you move attributes out into separate objects, you
typically have to traverse a list for every access. This is a bit of a
problem, one of the reasons why LISP hasn't taken over the world yet,
for example.
For many purposes, the list traversal penalty is acceptable as long as
the list rarely exceeds ~100 items. When a list gets into the thousands
of entries, more sophisticated data structures need to be adopted. A
very common case is where a list is can old items that are members of a
class or any of its sub-classes, and you want to quickly enumerate only
the ones belonging to a particular sub-class.
A simple way to represent a set is to use a doubly linked list, one
entry for each set member. This is insufficient to allow fast access by
sub-class, so one alternative might by to store a set as partitioned set
of sets, one for each sub class. Then when we want to scan the members
of the set that are of a certain sub-class, we could scan the top level
list looking for the subset that contains that sub class, and then scan
that subset. Properly speaking, you would need a forest of subsets
rather than a simple lists, but either way massive (10-1000x)
performance improvements can be obtained by reducing a list search from
O(n) to O(k) where n is the list size, and k is the number of elements
of the target sub class.
A typical example is dealing with an 'object' that represents a
universal, or a class. A class typically has only one or two base
classes, a half dozen attributes, but perhaps millions of instances.
Any system that needs fast access to base classes and class attributes
is going to fail miserably if the instance relations have to be scanned
to find the more fundamental relationships. I believe what I have
proposed will solve the problem nicely and have started an
implementation in C++ that I am going to use in my family history
system. The trick of course is to encapsulate it all nicely, so that
the ugly details are not apparent in higher level code.
Comments?
- Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://six.pairlist.net/pipermail/om-list/attachments/20031009/6ae6579d/attachment.html
More information about the om-list
mailing list