[om-list] MTShell and pipes

Sat Nov 10 22:14:42 EST 2001

Mark

    I'm starting to design a few things, and I have more questions.  Can you
tell me more about pipes and shared memory, regarding the following issues?
The two issues are (1) simple branches in pipes, and (2) "pipes" that save
the entire message stream, to be read by many utilities multiple times in
serial.

    (1)  I guess I have told you that I would like to be able to pipe the
output of one pipe-writing utility into multiple pipe-reading utilities.
Does the pipe function already allow for multiple readers of the same pipe,
or will I have to construct some more elaborate mechanism?  I assume that
the pipe does not store all the data it receives.

    In other words, how does the pipe mechanism keep the writing-utility
from writing more data than the pipe can store before the reading-utility
can read the next segment?  I'm envisioning cases where the reader has a
slower process to perform than the writer.  And can this slow-down mechanism
cause the writing-utility to wait on more than one reading-utility?

    (2)  Because I assume that pipes can not store all the data they
receive, I'm thinking I may need to create an alternative for some
situations: a "memory pipe".  This pipe would actually be two pipes, plus
shared memory, plus memory-reading and memory-writing utilities.  The stream
would be written to shared memory through the first pipe, and then read from
shared memory and piped on to various utilities (which could be many) at
arbitrary times, through the second pipe.  This would be used in situations
when the same piped message would need to be read several times in serial.
I don't want to write the message out to a file, but I also don't think that
a normal pipe can handle saving a large message/stream for many processes to
read, even if the pipe mechanism can handle giving small bits of a message
to many utilities simultaneously (as discussed above in my first issue).

    I guess this may be an issue of slowing down the other processes
attached to the pipe -- I don't want to do that in this hypothetical
situation.  For example, I want to write a shell script that will test
several learning algorithms on the same dataset.  I want to allow one
learning algorithm at a time to read the given data, and I also don't want
to slow down the other algorithms reading from the same pipe.

    I guess I could run all the learning algorithms simultaneously, and use
a pipe splitter to give each of them the data, but some of the algorithms
might take a lot longer to run than others, and/or I might want to see the
results of one utility as soon as possible (before the other algorithms even
have a chance at running).

    Actually, I envision a system where I construct a learning algorithm
so simple that it is actually only a classifying algorithm, and the
controlling script has the task of giving the classifier the same dataset
many times, each time changing some of the parameters according to the
previous classification accuracies.  In this way, I think I can construct a
very generic learning algorithm, one that can train anything from a neural
net to a radial-basis function pdf.

    So, you see that even if I run one learning process, I will want the
dataset saved in memory, outside of the learning process.  I guess the data
could be saved in the shell that is training the classifier -- until I add
the idea that I want to test more than one classifier on the same data.
Maybe that simply adds another layer to the classifier-executing shell
script.

    Any thoughts?

    I know you might tell me to write the generic training algorithm in
C/C++ for speed.  I'm not sure I like that idea or not.  I'd almost rather I
write a compiler for my MTL.  I don't know.  I'll have to think about that.
But, I don't think that would change the two concerns I'm asking about here.
So ... any thoughts?

tomp