[om-list] System Design

Sat Oct 28 11:43:30 EDT 2000

I worked hard on a reply about two weeks ago, and now I can see
it didn't go out and I don't have it in my Sent box or anywhere. I'll
try again. Frustrating. Sorry. 34 and already senile!

More comments below.

Mark Butler wrote:
 > Luke Call wrote:
 > > What
 > > causes us to require a special format other than objects decomposed 
onto
 > > a database?
 > Same purpose as export formats like XML and GEDCOM.  Different 
applications
 > may use different kinds of databases depending on their own requirements
 > (speed, capacity, etc.), but text export formats provide a common way 
for them
 > to exchange information, as well as a common analysis technique.  The 
only
 > hard core requirement is that there is a 1:1 mapping between the 
format and
 > the database.

For the file format, do you mean only for export, or as the native
repository? How would we store our whole system (eventually terabytes
and beyond, I hope) in a text file format? Some kind of database seems
the only way, unless I'm missing something.

 > Adding certain capabilities is a matter of compromise - I may need 
them to
 > satisfy personal requirements for a useful system and only ask that 
the system
 > be capable of having such a function developed for it.  I certainly 
do not
 > expect that people who do not care about certain features volunteer 
their time
 > to develop support for them.

I'd like to understand your personal requirements better since I'd hoped
they were in the requirements document. Or maybe the difference is in
your approach vs. mine (until they become the same) to those requirements.

 > The only critical questions are those that force significant 
constraints on
 > the lowest level data structures and how they are implemented.  For 
example, I
 > doubt that any form of analysis could be effectively performed on a LISPy
 > structure stored in a relational database without caching large 
portions of it
 > into memory for processing.  Unfortunately, tracing thousands of node 
graph
 > edges for a good size problem just to get them into memory is likely 
to take
 > several minutes. If the problem size is limited, loading and parsing 
a text
 > export file into memory is likely to be at least a hundred times faster.

Again, if the system is terabytes in size, creating loading a text file
would have to be an optimization for very specialized uses of the system
or something, don't you think? And I erred in saying RDBMS when I should
have said "some kind of database"--OO DB may be better but I don't know
for sure yet.

 > Now if you have a very large problem set with no convenient 
boundaries you
 > need a database that is optimized for graph traversal, much like most 
modern
 > object oriented databases.

OK, maybe we're together.

 > In any case, regardless of how many implementations we end up with, 
we cannot
 > work together without a common lowest level logical data model.  I 
suggest a
 > LISP-like data model because I know that it is sufficiently flexible to
 > represent virtually anything, but also recognizing that it has severe
 > implications for practical database implementations.

As you've pointed out, there's the central point to discuss. Couldn't an
OO model and a predicate calculus model (which I still naively see as a
bunch of logical statements in a formal language that a computer loads,
parses, and traverses) each represent the same things? Then the question
becomes effiency in expressing and calculating. For example, your
earlier examples demonstrating classes defined by queries vs. defined
explicitly (the "northern city" example) could also be expressed in the
OO system I envision (I guess I'll start saying OODB meaning
conceptually, though not ready to commit to an implementation), by
saving the query as a class definition (list of properties & required
values?) and any explicit class relationship as a property which defines
a relationship.  Listing properties & required values as a query
definition, vs. expressing it in a formal language seem to both express
the same concept, but as the article you referred to on different ways
of representing knowledge says, they each lend themselves differently to
different things. I want to get it all in a big database and be able to
do queries and simulations on the fly. I guess you want to do deduction
and induction? But can't an OO model express the same things as FOPC
(datums, values, relationships), but much more efficiently, and in a way
that better represents the natural world as well? And where language is
concerned, I think to decompose it into fundamental meanings & express
it works find in an OO model as well. But you may see issues here that
FOPC or such does and OO doesn't--what are those, specifically?

One interesting thing from the article you sent is that different ways
of representing knowledge have different strengths & commitments.
However I fail to see weaknesses in an OO expression of anything. Once
you help me see them, we may be able to do as the article suggests and
combine them in ways that enhance the expressiveness and utility overall.

 > Our number one problem is that we have no consensus on the meta-model 
level. I
 > want a first class, singly rooted system capable of analyzing logical 
formulas
 > or sentences, performing natural language translation and so forth - that
 > pretty much requires the user to be able to dynamically create meta model
 > structures for arbitrary abstractions and refer to them in any 
context, which
 > prohibits a relational database implementation at the meta-model 
level (i.e.
 > with hard coded concepts for what classifications entities, 
attributes, names,
 > and so forth need to be in).

Perhaps you could describe this for me in terms of a specific problem
to solve (or several, if there are fundamentally separate issues to
address in them), then we can each say how we envision solving it.

 > Tom is working on developing an optimal meta-model that I do not 
understand
 > very well, mostly because he hasn't finished it yet.  From what I can 
tell
 > Luke would be happy with a meta model composed of objects, 
attributes, names,
 > and relationships.  The problem with fixed meta-models is that they 
entail
 > strict ontological commitments about not only what is real, but what
 > abstractions are capable of being represented.

There is nothing fixed about what I envision. I'd like to take
everything and dump it in--including formulas, linguistic info, etc.

 > Natural languages force no such ontological commitments - anything can be
 > treated as a first class object, i.e. "noun-ified".  If we want a 
system that
 > can store natural language in its native form for further analysis, 
we have to
 > standardize on a meta-meta-model layer that can represent any 
sentence in any
 > language.  The same goes for any general form of automated reasoning, 
i.e. one
 > that can perform inference chaining on arbitary statements.

This is where I began thinking about the whole thing. One very slick
approach to building the repository would be to decompose sentences
(whether in natural or formal language) en masse, in the most automated
way possible (though human assistance may be needed), making nouns into
objects and properties, adjectives into properties, verbs into methods
and so on, on the fly. Do that to all available text in the world,
optimize well, and we have a wonderful thing.

 > If we are to live up to the name "One Model", we need to have a 
lowest level
 > data model capable of representing what other people believe in a form
 > equivalent to its original representation, which is natural language.

I think natural language (at least in current forms) seems like an
inefficient way to communicate thoughts. We learn visually much faster.
And it seems like for me to express myself in natural language, I first
have to translate from my thoughts (pictures, movements, objects and
behaviors I have stored in my memory based on experience or thought)
into language, send it to the recipient, who must do the reverse
translation in order to get it into their own head. And this is error
prone, especially across people who use the same words for similar but
different things, etc etc., as any married person knows. I think an OO
model more nearly represents the things as they are stored in my head,
AND AS THEY ARE IN THE NATURAL WORLD. So if I can express myself (or
store the knowledge for later perusal) in a more nearly native form (or
*the* native form?) which is an OO model of what really exists, then it
can be expressed to the recipient in whatever cool way the interface
allows, whether that be making the translation to human language (using
the names I've assigned when I put it into the model) or visually, by
exploiting the info in the database to create a VRML or cartoon
representation of these things. But there need be no ambiguity on what I
meant, because the richness and thus accuracy of the data stored might
be bounded only by our time & ability to record it in detail.

 > If we do otherwise we drastically constrain what our methodology is 
capable
 > of, reducing our purpose to building a bigger version of what so many 
people
 > are already doing.  We could also build the world's largest distributed
 > semantic network for human research and navigation but find it 
impossible to
 > perform the kind of common sense reasoning that databases like Cyc are
 > designed to make possible.

This OO model is unlike what others are already doing in that it is
optimized for total openness--allowing folks to enter everything they
know, rather than creating a new software system for each niche. I don't
see the constraints in it, that you do.

 > Cyc is based on a LISP meta-meta-model, which makes it capable of being
 > extended to do all the things I have described above.  If we force a 
higher
 > level meta-model design, we automatically concede the whole general 
purpose AI
 > field to Cyc and projects like it.

But it seems so inefficient to load terabytes of FOPC statements, then
go thru & figure out where the relationships are, vs. following a few
pointers to find out those relationships.

Maybe for the types of statements of formulas, expectations, or
principles, I have envisioned using either queries to derive them (like
"common sense" about the world, which I don't think we should try to
store, because it is brittle and context-dependent--an OO model handles
that, and common sense stored in Cyc won't work if your context is egypt
2500 years ago, or Pluto), or more formalized statements to express them
in methods (like laws of physics), or "ideal" object representations
(like expectations, plans, or hopes).

 > That is not necessarily a bad thing, it just means we are narrowing 
our focus
 > to a specific application domain rather than trying to build the 
Swiss army
 > knife of knowledge representation.

I don't want to narrow to any domain--like in our earlier discussion of
creating namespaces up front vs. sometimes having to meld models--the
whole reason I thought first of building software, rather than building
the representation first, is that the software is the necessary tool to
build the representation--it is a process of creating objects that model
anything you want to express, in any domain, and of assigning
human-language names as convenience dictates, and of creating
relationshps and sharing that model with others, with the hope of
building one huge one at some point.

 > Again, this article is very good on these
kind of issues:
 > What is a Knowledge Representation? R. Davis, H. Shrobe, and P. 
Szolovits. AI
 > Magazine, 14(1):17-33,1993.
 > http://www.medg.lcs.mit.edu/ftp/psz/k-rep.html

It was good. Noteworthy was the part about combining representation
systems in a way that takes advantage of the strengths of each in what
they can represent.

 > I would like to hear from both Tom and Luke on whether you agree it 
is best to
 > base the system on a first class meta-meta-model, with all its 
implications....

I hope that this discussion can eventually make explicit enough of how
you & I each see things that we can use the best of both.

 > > When we get together around Thanksgiving (assuming we can?) it would be
 > > cool to have a white- or blackboard, or at least a place to gesture and
 > > talk for a while. Place, anyone?
 >
 > We can have it at my house.  662 N. 100 E. Farmington, UT 84025

More scheduling questions: my sister scheduled a family photo for the
friday after thanksgiving, at 2:30. Time to talk might be good. Would it
work for y'all if I show up sometime between 4:30-6pm, or, say 8:30 am
Friday, for a morning or evening of talk?

-------------
Help us put all knowledge in one bucket: www.onemodel.org.