The recent leapsecond triggered problems. There is a bug in the Linux handling of leapseconds. It affected servers whose administrators had chosen the option of having a 61 second minute. For a summary, see the H report, or the detailed bug analysis in these two email trees. Short form: under heavy load, like a database server, there is a low probability that the kernel time structures, futex, etc. are not all correctly updated. This leads to deadlocks, load spikes, etc.
From a user and administrator perspective, leap seconds force a choice between two problem alternatives:
- You can deal with the problems of 61 second minutes. This is a problem even bigger than the Y2K problem. Everyone knows that a minute has 60 seconds, so tracking down all the places that this assumption was made, and correcting it to deal with 59, 61, and 62 second minutes is a massive undertaking. Most code doesn't really care, as long as there are only 2 digits for the seconds in a minute. But computations of time intervals are affected, as is error checking logic. There are several versions of a popular DBMS that do not permit date fields to have a 61 second minute.
- You can accept the problems of differences between system time and UTC. This introduces synchronization problems. Again, most applications do not care. The world managed to operate without leap seconds up until 1972. Even afterwards, most uses of time do not care about a few seconds mismatch in time between systems.
The real world easy alternatives are:
- The default Windows mode is 2), inaccurate time. Actually it's rather brutally inaccurate. The clock just keeps running from second 60 into the next minute, so it's a second+ fast. Then, at the next time check, Windows just slams the time back by one second, running time backwards for a moment. Your software had better be prepared for this on Windows. Getting higher accuracy NTP time on Windows requires a lot of special effort.
- The Linux, Mac OS, and Unix worlds offer two configuration choices:
- 61 second minutes. This is the default. During the 61st second time is kind of weird.
- 60 second minutes, and a controlled inaccuracy. NTP can be configured to ensure that time always runs forward, and it smears the leapsecond out over a long time period. For example, you can set it to start running the clock slowly 1000 seconds before the leapsecond is due. Then at the UTC leap second, the computer time is 59.5 seconds. It's 500 milliseconds slow. At computer time 60 seconds, it switches to the next minute. Suddenly it's 500 milliseconds fast compared with UTC. And another 1000 seconds later NTP speeds the clock up back to normal speed.
From the perspective of medical systems, I think the clear choice is the 60 second minute. There are very few medical applications where the relationship to UTC genuinely needs to be accurate. Up until late in the 20th century people were using wall clocks as sufficiently accurate for medical records. These could easily be 100 seconds different from UTC.
Medical records are somewhat more concerned about time being locally consistent, and even more concerned that time only run forwards. But even here, most record requirements can tolerate errors of 5-10 seconds. This makes sense when you consider that the speed of blood flow through the human body or signals over the nervous system means that it takes a significant fraction of a second for actions in one part of the body to affect other parts of the body.
There are exceptions where time needs to be much more tightly synchronized. Cardiac cycle synchronization of equipment needs time coordination in the few millisecond range. Video/sound need to be synchronized to within tens of milliseconds. These kinds of systems have special needs and should not depend upon the default behavior of NTP attempting to synchronize with UTC. These deserve special engineering consideration.
The data that results can be labelled in bulk with the usually sloppy UTC accuracy of other medical records, but internally it needs to be much better coordinated. DICOM deals with this by having several data elements:
- The data blocks are time labeled for things like "start of acquisition" using both UTC and identifying the NTP server used as the UTC reference. Closely coordinated data should all use the same NTP server. This server should use 60 second minutes. The error relative to UTC is not important. The internal consistency is important.
- Internal timings are measured in seconds relative to start of acquisition. This means that there is never a discontinuity within the measurement period. Having this dual time structure makes it easy to maintain coordination. The time durations of medical measurements are not long enough for internal clock drift to matter.
The long term fix is to eliminate the leap second. At the most recent standards meeting a majority of the members voted for the US proposal to eliminate it. But it was not the required supermajority. China and a few other countries voted to keep it. The issue is a tradeoff between all the Y2K like problems of the leap second, and one more piece of complexity for the astronomers, navigators, and others. My opinion is that those users already deal with the variations of the earth's wobbles, etc., that demand much more accuracy. The leap second does not help them much.
Comments