[om-list] Attribute Access Methods
Mark Butler
butlerm at middle.net
Tue Oct 28 20:15:48 EST 2003
Attribute Access Methods
List trees of order can easily reduce the access path to a keyed member
of a 1000 member set stored in memory to about twenty edge traversals.
A modern computer can traverse edges at a rate of about 200 ns / edge,
assuming one L2 cache miss per edge. That leads to a known attribute
access time of about 4 us. That is not bad, but it would be nice to
improve if you need to process ~500 million attribute (5 million record
x 100 attribute) data sets.
The best way to improve lookup performance is to cache the results. A
direct mapped cache of adequate size can reduce average access times by
an order of magnitude. In this case, checking a direct mapped cache
requires one integer division and two likely L2 cache misses, about 440
ns, or roughly nine times faster. That compares to perhaps 50 ns
average per traditional static attribute access (five times faster yet),
due to the advantage of high cache locality and a considerably smaller
working set.
Of course, if attribute records are 64 bytes each, one can only
reasonably fit perhaps 8 million in typical 512 MB area. If the system
starts to page, you might as well give up unless you can either pay a
1000x performance penalty or redesign all of your data structures for
page based locality. If the overflow isn't too serious you can either
buy more RAM and move up to a 64 bit machine, as appropriate, of course.
One of the problems, of course, is that all these per-item indexes and
caches have a considerable overhead, typically double the size of the
data being stored. That is a major penalty to pay when every attribute
is an item in and of itself - e.g. 128 bytes/attribute instead of 64.
My solution is to allocate indexes and index caches only for items that
have more than a threshold number of references.
I expect the optimal threshold to be about 8 references per item. Each
meta-relationship to another item consumes one reference. Most
attributes will have one incoming reference (for an evaluation) and two
outgoing ones, one for the attributed object and one for the domain. On
the other hand, some objects (person records, for example) will often
have hundreds of internal and external references - for attributes,
events, family relationships, and so on - where a reference index and a
cache would be a great help.
Final note: I learned to program on architectures that did not have CPU
caches. The latter have some really interesting side effects due to
cache line size, replacement latency, and so on. In-memory index
traversal is much faster if you can locate multiple keys in the same
cache line (typically 32-64 bytes), because each L2 cache miss fetches a
full cache line. For my purposes, that means I should really rewrite my
current N-way list tree structure to resemble something much more like
a traditional B-tree. The irony, of course, is that CPUs are so fast
relative to memory that the same techniques for optimizing mechanical
disk access work for RAM access as well, just with a much smaller
'sector' size and a time scale reduction of 1000 or so.
- Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://six.pairlist.net/pipermail/om-list/attachments/20031028/9d8e1542/attachment.html
More information about the om-list
mailing list