[om-list] Attribute Access Methods

Tue Oct 28 20:15:48 EST 2003

Attribute Access Methods

List trees of order can easily reduce the access path to a keyed member 
of a 1000 member set stored in memory to about twenty edge traversals.  
A modern computer can traverse edges at a rate of about 200 ns / edge, 
assuming one L2 cache miss per edge. That leads to a known attribute 
access time of about 4 us.  That is not bad, but it would be nice to 
improve if you need to process ~500 million attribute (5 million record 
x 100 attribute) data sets.

The best way to improve lookup performance is to cache the results. A 
direct mapped cache of adequate size can reduce average access times by 
an order of magnitude.  In this case, checking a direct mapped cache 
requires one integer division and two likely L2 cache misses, about 440 
ns, or roughly nine times faster.  That compares to perhaps 50 ns 
average per traditional static attribute access (five times faster yet), 
due to the advantage of high cache locality and a considerably smaller 
working set.

Of course, if attribute records are 64 bytes each, one can only 
reasonably fit perhaps 8 million in typical 512 MB area.  If the system 
starts to page, you might as well give up unless you can either pay a 
1000x performance penalty or redesign all of your data structures for 
page based locality.  If the overflow isn't too serious you can either 
buy more RAM and move up to a 64 bit machine, as appropriate, of course.

One of the problems, of course, is that all these per-item indexes and 
caches have a considerable overhead, typically double the size of the 
data being stored.  That is a major penalty to pay when every attribute 
is an item in and of itself - e.g. 128  bytes/attribute instead of 64.  
My solution is to allocate indexes and index caches only for items that 
have more than a threshold number of references. 

I expect the optimal threshold to be about 8 references per item. Each 
meta-relationship to another item consumes one reference.  Most 
attributes will have one incoming reference (for an evaluation) and two 
outgoing ones, one for the attributed object and one for the domain.  On 
the other hand, some objects (person records, for example) will often 
have hundreds of internal and external references - for attributes, 
events, family relationships, and so on - where a reference index and a 
cache would be a great help.

Final note: I learned to program on architectures that did not have CPU 
caches.  The latter have some really interesting side effects due to 
cache line size, replacement latency, and so on.  In-memory index 
traversal is much faster if you can locate multiple keys in the same 
cache line (typically 32-64 bytes), because each L2 cache miss fetches a 
full cache line.  For my purposes, that means I should really rewrite my 
current N-way list tree structure to  resemble something much more like 
a traditional B-tree.  The irony, of course, is that CPUs are so fast 
relative to memory that the same techniques for optimizing mechanical 
disk access work for RAM access as well, just with a much smaller 
'sector' size and a time scale reduction of 1000 or so.

  - Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://six.pairlist.net/pipermail/om-list/attachments/20031028/9d8e1542/attachment.html