[om-list] More Questions, and more TomP software advertisment

Tom and other Packers TomP at Burgoyne.Com
Sat Sep 8 13:53:40 EDT 2001


Salve Alia:

    Mark Butler and I had a good discussion last night, and I think we have
a very nice two-phase plan in making a shell-based "TFStudio".  Thanks for
your help so far, Mark.  My ideas were all vague, intuitive, subjective, and
fuzzy; you've made them concrete enough to implement.

    I think we solved one of the two concerns that Mike mentioned.  Namely,
we strongly believe that the proposed script interpretation process will not
be slower than TF's current dialog-based sequence, even when compared to
TF-Protocol.  (Mike and Curtis, see below for an explanation.)

    Now, how about debugging?  I'm still not sure how this would work.  But
I think it will have to work on two levels: in the high-level script, and in
the lower-level utility source code.

    For the script-level, I think we'll just figure out stuff to output
somewhere: to a file, to the screen, maybe even to the TFStudio displayer,
taking advantage of its real-time displaying capabilities.  That might not
be such a bad idea: in debug mode, each utility could display everything it
outputs in TFStudio, as well as piping it to the next utility.

    Mike, I will have questions in the near future about how your real-time
feature works.

    But, as for the utility source code level ... I will be using VC++,
making console applications, I believe.

    Here's my main (and admittedly ignorant) question for all three of you:

    How can I run a whole script (or a DOS batch file) which calls two or
more utilities, sometimes simultaneously, and be able to step through the
source code of any or all of the utilities?  All three programs would have
to be running in the debug mode of their respective debug builds.  Is there
an easy way?  Would I have to make a project workspace containing all three
projects (e.g. the shell and the two utilities) in order to place usable
break-points among all of the source-codes simultaneously?


    That question was the primary reason I'm writing.  No one need read any
further, (but it would be charitable of you if you would :-).


    Okay, here's the brief design and explanation of our prospective data
analysis package:

    The functionality of current (and future) TF-Studio utilities will be
duplicated by writing a larger number of smaller utilities.  Each utility
will be compiled into an executable, so it will not be slow.  The package
will be similar to the current TF-Studio in this regard.  The global script
will have little to do but execute and co-ordinate these compiled utilities,
which utilities (similar to our TF Protocol-based study process) would do
the bulk of the work, so the over-all "study" process should not be
noticeably slower than our current protocol study.  (It may even be faster;
please read below.)

    Each utility will be able to run from the command line (DOS prompt or
batch file) for the first phase of development (i.e. the concept validation
phase), and then from an internally-designed-and-developed *shell* for the
second phase of development.  Each utility could therefore receive command
information (parameters) from either the shell or from the command line
without any modification between phases of development (as explained below),
i.e. it could run automatically or manually during both phases, using the
same source code.  It could run automatically immediately using batch files,
but we would gain a lot of power with a shell and accompanying scripting
language.

    Each utility would receive the bulk of its input (the "input file" so to
speak) from *pipes* attached to stdin; and it would write the bulk of its
output (the "output file") into more pipes attached to stdout, hence the
ability to run from the command line and from the shell, both of which can
create and connect pipes.

    Pipes are a beautiful thing.  Mark sat down and wrote a two-pipe /
three-utility assemblage last night as I looked on, and ran them all from
command lines.  Two random-number generators and one multiplier, each
looping endlessly into the night.  It was poetry.

    In phase 2, a utility would give back to the shell high-level control
information in its return value to facilitate the power alluded to above.
A utility could have more than one input pipe, and more than one output
pipe.  One of these output pipes could be used to duplicate the data being
sent to the next utility, giving the copy to TFStudio for display, or to one
of the file-writing utilities for record.

    There would be a generic ASCII file-writing utility which simply
converted its piped message into ASCII numerals and then wrote them to a
text file.  There would also be a generic TSF file-writing utility which
would make little TSF file fragments, (perhaps in the form of single epochs
or single channels at a time, or even single frames, depending on the
function of that particular utility).

    Etc., etc.  The possibilities are practically without limit.

    Why will this benefit TF:

    Three features of the new design could make data analysis studies faster
than TF-Protocol, in run-time:

    (1)  From what Mark says, if we would use the WindowsNT OS on
multi-processor hardware (true parallel processing support), we would be
able take advantage of it, as implied above.  (I'm guessing that our current
TFStudio would not easily gain the same increase in speed using this kind of
hardware.  Dan, any news regarding that reconfigurable hardware company ...
what was their name, Stargate?  Starbridge?  I bet we could really be able
to take advantage of that kind of hardware some day.)

    (2)  Also as explained above, there will be much less reading from and
writing to the hard drive in the proposed design.  This will save hard drive
space (which we are *always* running out of at TF), and also time --
potentially a lot of time -- regardless of which OS and hardware we chose to
run it on.  This prospective time-saving benefit is given in comparison to
the *future* TF-Protocol, the one for which all utilities will actually obey
the protocol flags, and therefore run automatically, without user input such
as OK-button pressing, which we have never quite achieved in practice at
Thoughtform.

    (3)  The flexibility of finer-grained utilities will give the data
analyst the ability to find and create a shorter path from the first state
to the last state of the study process.  For example, if we had the
functionality of our current Classify and Average in the proposed format, we
could average multiple classification values, which we do now frequently,
without having to first process the output of Classify using TranslateAscii
and CombineChannels, as we do now.  This principle applied to bigger studies
will make them shorter, and therefore faster.  (And with the shell script,
we could easily modify such studies to carry out similar, but unique,
alternate studies on the same or similar data, as we do now using protocol
scripts.)

    Two features could make the new design faster than TF-Protocol, in
subsequent development time, (and therefore in ultimate run-time):

    (1)  The fine-granularity of algorithm compartmentalisation will give
the data analyst the ability to virtually write new coarse-grained
utilities for himself.  That is, he could produce the functionality of some
of our current utilities by simple re-combining the fine-grained utilities
being used to duplicate the functionality of other of our current utilities.
He would not have to wait for programmers to create a whole new utility as
frequently as he does now.

    For example, if we had enough of these proposed low-level utilities to
duplicate the GroupHistogram and Average functionality we have in two of our
current utilities, we would not have needed to ask Curtis to make the
GroupAverage utility -- a utility, I might add, that came too late to help
us in the project for which it was developed. We could have been using it
from the beginning, *virtually*, by simply reusing the features we had made
for the other utilities' functionalities.  This principle could be
re-applied endlessly, in virtually making many of the utilities we have been
wanting but have not had programmer time to produce, plus many virtual
utilities we haven't even thought of yet.

    (2)  The simplicity of each utility will make development time quite
speedy compared to current projects.  Admittedly, there will be more
utilities to be written this way, but in my (little) experience, about half
of development time is spent on GUI nonsense.  Some of the time saved will
be used up in alternative forms of feedback to the user, but I don't think
all of it will be lost.

    The bugs can be worked out faster when the utility is smaller and
simpler, and used more frequently to create a wider-variety of "virtual
utilities".

    Additionally, the utility collection will be smaller in size, because it
will contain much less redundancy in code (such as file reading and writing,
etc.), and less GUI overhead per individual utility.

    I think the choice is obvious.  If Dan still doesn't agree, I don't
mind.  I'll work on this on my own.

    You guys probably think I'm silly, but I am excited about this.  I may
even disrupt my senior thesis committee's expectations by changing my
proposed project from machine learning to writing a high-level time-series
data processing shell scripting language and accompanying utilities.

ciao,
tomp





More information about the om-list mailing list