[LEAPSECS] Schedule for success
M. Warner Losh
imp at bsdimp.com
Sat Dec 20 13:33:57 EST 2008
In message: <D754EF5C-767A-4FF0-AC64-6E9543AAA62A at noao.edu>
Rob Seaman <seaman at noao.edu> writes:
: Poul-Henning Kamp wrote:
: > Steve Allen writes:
: >> Please identify the operations which need one second predictability
: >> over a time span of six months.
: > Wrong question.
: > Try: Please identify computer communications where it is not
: > guaranteed that all involved computers will have their software
: > updated every six months.
: Meant as a bon mot, I guess? Seems to emphasize Steve's point in any
: However, you've actually identified a potential mechanism for
: distributing scheduling data of all sorts, including for leap
: seconds. Instead of building computer hardware, operating systems and
: applications that pretend the relentless update cycle doesn't exist,
: build such systems to expect scheduled updates to software and key
: data structures. Leap seconds are just one from a large class of non-
: static information that needs to be widely shared in common for
: infrastructure to work.
Sadly this is well divorced from reality. People can and do build
systems that have a long shelf life. It is routine in certain sectors
to buy 10 of something and put 8 into the field. The other 2 are
spares and sit on the shelf for a long period of time. The software
is rarely updated on systems like this (why should it be, they are
simple and bug-free enough to run for years). These systems are
expected to run for 10 years with < .001% (so called 5 9's) downtime.
Upgrades make that nearly impossible to meet.
When one fails, another one gets swapped in. Otherwise the system is
up all the time. To force an upgrade every 6 months would force a
down time, which is unacceptable. It would also, in many cases, for
someone to physically go to the location where the systems are running
to do the upgrade since many of these systems aren't on public
networks (and the private ones are oversubscribed with their current
data loads, no room for extra software updates).
The non-regularity of leap seconds makes this very hard to do. Even
with a GPS receiver in hand, it can be hard to start cold, and there's
no way to startup reliably if you've been off as little as one year.
These systems routinely exchange data with timestamps, some of which
is historical. Without leapsecond knowledge, you get degraded
performance. Systems that are off for a year have no clue when the
last leapsecond(s) were, unless there weren't any in that time. This
can and does lead to degraded performance in some cases.
: Perhaps my frequent observations about the wisdom of following system
: engineering best practices simply need to be redirected to a broader
: class of problem to find more acceptance :-)
Except, of course, in this case good system engineering is that these
systems will run, unattended (and unnetworked), for years doing the
job they need to do. To force them all to upgrade just because of
leap seconds is silly.
You've constantly poo-pooed the notion that people that have actually
written and deployed dozens of these systems know what they are
talking about. You all but call such people morons for not following
good system engineering practices. Yet, you show a surprising
ignorance of how things actually work and of system engineering
practices demanded by customers.
The root cause of all of this is the irregularity of the scheduling of
leap seconds. If they were on a schedule, known years in advance,
then these systems could be built.
Imagine if you have to find out from the pope every year if this year
was going to be a leap year or not? There's all kinds of problems
*THAT* would cause, and nobody would debate it....
More information about the LEAPSECS