[LEAPSECS] Coding this week, and a trick for timeouts over leap seconds.

Paul Sheer p at 2038bug.com
Sat Oct 1 05:16:52 EDT 2011



I am busy implementing some heartbeat monitoring code between two
machines. The spec calls for a 1 second recovery.

Basically if I get no heartbeats for 1 full second then I should
consider the peer system to have failed.

To cope with the leap-second scenario, one solution is to use a
timeout of 1 second longer than usual if the current time is close
to the turnover of the day. You can do this easily by checking

time(NULL) % 86400

and if we are at the turnover of the day use a 2 second timeout
instead of a 1 second timeout.

Now this seems like a nice and easy way of fixing old code. Here
is an example:


void process_event(Event e)
{
long long now = gettimeofday_in_millisecs();
if (now > last_recv_time + 1000) {
peer_has_failed();
} else if (e == EVENT_HEARTBEAT) {
last_recv_time = now;
}
}


Becomes:


int near_turnover_of_day(long long t)
{
#define FUDGE 2
if ((t + FUDGE) % 86400 <= FUDGE * 2)
return 1;
return 0;
}

void process_heartbeat(Event e)
{
long long now = gettimeofday_in_millisecs();
if (now > last_recv_time + 1000 +
1000 * near_turnover_of_day(now / 1000)) {
peer_has_failed();
} else if (e == EVENT_HEARTBEAT) {
last_recv_time = now;
}
}


Comments?

(This example is off the top of my head so please excuse any errors.)

-paul









More information about the LEAPSECS mailing list