[om-list] Re: Cyc example

Mon Sep 25 08:43:52 EDT 2000

Mark,

I too find Cyc very interesting and am still trying to find enough time
to read more at their web site. One thing I'm not sure of yet is if
their concept can handle arbitrary knowledge stored in an object model,
and is able to do all the things in our requirements list. They have
obviously done a lot of work. I really like the idea of keeping
everything we do open and free. I recognize that the software to
maintain our model and the core model are separate, but maybe I have to
have the software to start playing with the core model to learn about
it, its limitations, and what it needs to go to the next level. Maybe
that's just my personal limitation, or maybe we don't want to
pre-define any taxonomy--our taxonomy is the real world.

I envision a software system that allows anyone to
easily model whatever they see, think, or observe. So the "One Model"
name is because the software allows one to easily build a model of
whatever interests him or her, to enter data manually or programatically 
and it naturally goes into a totally coherent single system. Then when
the software allows anyone who wants to, to blend their "One Model" with 
anyone else's, we can combine ours or restart a joint version, making
that model available to others, so they can also enter the data that
interests them, building on what we have to achieve a network effect in
terms of geometric value growth.  These models or data stores can be
shared and blended. And searched etc. (One would want to consider
security or separateness for any personal or confidential information
kept in it.)

I often consider that
the D&C says the earth will be someday be like glass, a Urim and Thummim
to its inhabitants, showing all things pertaining to a lower order of
kingdoms etc. etc., which strikes chords in me relative to having one
massive database with all available truth stored in it. (For what it's
worth, I also often think of that one phrase near the very end of the
temple endowment, about all truth. Do you know the one I mean? Probably
I shouldn't quote it outside there.) We're not there yet but we can
start small, by building a model, but using the software to do it--to
enter the data and build things up incrementally. Then when the engine
is there with a good API and a handful or more of initial data, we
develop code to import specific domain knowledge from XML stores,
wrappers to get data from other databases (where desired), grammar
parsers etc that take existing knowledge stored in all these other
formats and libraries and systematically import it, with human help when 
needed. Then since it will get huge, maybe we can find distributed ways 
of storing it (that's a big maybe).

If the goal is to store all
knowledge, then what is knowledge, at an atomic level? I currently
believe it is an object model of the things in my mind or someone else's 
mind. I don't (yet?) understand how extensive lists of logic statements 
(and surely not statistical sentence comparisons) get you there. To
create an object model all you may need (at least to begin with?) are
ways to add objects/classes/attributes/relationships etc., and
manipulate them, in a way that remains coherent. Again, I am probably
missing the picture as you see it, but it seems like that's all we need
to start, then we can study the algorithms to traverse (search) it
effectively, nicer interfaces, multiuser, bulk data import, and so on.

For example, one of my first test uses of the system will be to input
the personal organizational things that I now enter into a Sharp Wizard
and various outline/text editor things I have. Then genealogical
information and family pictures, etc. I don't want it limited,
but very flexible and still very consistent. Perhaps others will do the
same--modeling or recording information from biology, language, or
whatever interests them, without being constrained by a need to really
understand any formal system or taxonomy (though there may be such,
under the hood), but constrained or guided by the software's design and
constraints which tend to keep everything
coherent (at least within what Cyc calls "microtheories"--nice term) and 
leads the user in a self-consistent direction.
I also am hoping that any model our software builds would be independent 
of human language. Of course it has human language terminology tied to
it, but any object (and/or class, as we've discussed previously) could
have any number of names, and eventually or system could recognize the
context of
the user and thus tie the user's current terminology set to the objects
being viewed at any given time. Thus, there is no "namespace", just an
extensive set of related objects, recognized primarily by their
relationship to the user, or to some searched-for object in the user's
"personal" namespace (or context), the actual determiner of an object's
identity being not a name but attributes--place in time and space (and
genealogy, for people). (I expect it to have a unique key, but this is
contrived, and independent of any names.) The phrase I think of often is 
"truth is things as they are, as they were, and as they are to come", so 
then our object model seems so big that is is unwieldy to try have a 
namespace to name all data types, except within a given context, where 
collections of objects represent "contexts", or the terminology that a 
given set of users remembers things by (which names may be used by 
another user for a dozen other things, in various other contexts). Some 
objects may not
have a name at all, but merely a time/place and relationships to other
objects. There are hierarchies of contexts for different languages,
domain areas, or even for different periods of a person's life. But I
don't envision a namespace. (I say all this so you can see where I'm
coming from, and then how best to correct me.)

I once did a crude example for a data modeling class assignment where I
modeled "things", each with a uniquely generated key, where each "thing
had a parent "thing" (like for a part-of relationship), and any number
of attributes, which themselves were "things", breaking the data into
separate tables for each data type, so an attribute of a thing could
eventually be an integer, string, datetime, etc. Most attributes
eventually got stored in a 2- or 3-column table, mainly consisting of
the key to the parent, and the datum's value. Things could also have
0-n names, which related to the parent "thing" via the unique key, as
were all the attributes. Things also had a "source", such as "thing"
which could ultimately represent a human who entered data or some text
citation. So I envisioned this software tool which lets you enter
anything (any "thing"), broken down into an object model, and
traverse/search it conveniently, etc., until you've represented anything
that's interesting in the whole world, where there really are no
divisions between subject areas. Sort of like the object model systems
that weather or other software simulations use, but totally generic and
open, and eventually capable of doing simulations on the fly. And that's 
why I have not yet seen a need for us to create our own file format--it 
could decompose down to something that can be stored in a 3rd-party
database (whether OO or rdbms). (Of course we'd have to do import/export
to/from XML, GEDCOM, etc.)

Most unfortuately, since I don't know formal predicate calculus, it's
hard for me to know how something like Cyc fits with the above system,
or where the relative gaps are in the two ways of thinking, or how they
would fit together.  I do have a nagging feeling that systems like Cyc
and the other collection of "mindpixels" are useful tools for us, that
we will want to employ at some point--perhaps as data stores from which 
we further populate (or validate?) our own model. Or something.

Does this make any sense? Maybe with your knowledge you can help me
understand the failings in what I describe here, how it meshes (or
doesn't) with what you describe in your email, and/or compare & contrast
this with Cyc-type approaches better than I am able to.

By the way, what does Cyc mean when they say it isn't "frame-based"?
Does that mean it uses a bunch of declarations and logic algorithms to
traverse them, instead of tying all of them together in a unified model
with relationships running all around?

Luke