Tick-Tock

Working with Clocks

11 Dec 2019 - Mohamed El-Geish

Originally posted on LinkedIn.

The concept of time in computing is as important as the role it plays in other aspects of life. For any non-trivial program, understanding the order of events (e.g., reading and writing data) is key to verifying its correctness (e.g., ACID for DBMSs); such order is enforced using a logical clock that keeps track of logical time. The more prominent uses of time in computing have to do with capturing and representing physical time as we — humans — use it (e.g., capturing the time when a payment was made); physical clocks provide methods to obtain and maintain physical time in computer systems. A typical computer has a real-time clock (RTC), which is an integrated circuit that keeps track of physical time even when the computer is powered-off (in which case, a battery installed on the motherboard keeps it running). When the operating system loads up, it initializes a system clock using the value of the RTC, and then proceeds to maintain physical time using a system timer.

Non-trivial programs require not only a precise system clock, but also an accurate one; moreover, they interact with other systems, which have their own independent clocks. To ensure that interconnected systems have accurate clocks, they talk to a time server to get the coordinated universal time (UTC). Nowadays, the vast majority of computers synchronize their clocks over the Internet using the network time protocol (NTP). For example, the Windows operating system uses time synchronization services to update both the RTC and the system clock. The National Institute of Standards and Technology (NIST) uses atomic clocks to provide a time synchronization service that serves — at the time of writing this post — about 16 billion requests per day; NIST is the source of truth for UTC in the U.S. in addition to serving UTC to the entire Internet.

Atomic clocks are extremely accurate; in February 2016, scientists from the Physikalisch-Technische Bundesanstalt (PTB), in Germany, built an atomic clock that has a measurement uncertainty of 3 × 10⁻¹⁸. Before this engineering feat, such accuracy had only been a mere theoretical prediction. The accuracy of system clocks in typical computers is lower than those of the time servers with which they synchronized due to unreliable network routes. A time server that receives simultaneous requests from various clients will reply with identical NTP timestamps, but the time taken for these responses to travel over the network — via unreliable routes whose latencies cannot be accurately predicted — causes the clocks on those clients to diverge and become less accurate; such changes are known as clock drift.

Clock drift is a serious problem in distributed systems; one that cannot be solved but only mitigated. Software developers must be cognizant of its perils, and understand its scope and symptoms. Differences between clocks — even on the same device — can cause hard-to-find bugs in software that doesn’t consider such differences. For example, the CPU’s Time Stamp Counter (TSC) stores the number of clock ticks since the system was reset, providing software developers a cheap way — a single assembly instruction called RDTSC — to create a high-precision timer; however, that was only valid when the clock frequency was fixed.

Modern CPU clocks adjust their frequencies to cool down and save power when desirable; a multi-core processor can adjust the clock frequency of each core independently, causing each core’s TSC to change independently. Legacy software that uses RDTSC in timer implementations on a modern multi-core processor may witness time moving backward as a subsequent read may have a smaller TSC value (because it got executed on a slower core); such bugs can be really nasty because they manifest as sporadic erroneous behavior; I had to debug one before in a legacy code library – it was a tough nut to crack. Luckily, such bugs can be easily fixed by using a monotonically nondecreasing clock implementation like the steady clock class in C++11.

A more noticeable example of such issues is clock drift in distributed systems used for financial services, where it can cost millions of dollars in losses. For example, imagine two clients connected to two different servers to sell a huge number of stocks at the exact moment the trading window opens; how much of a difference would the drift between the two clocks make? Let’s work it out: I used an atomic clock time server (http://www.time.is) to calculate — thrice — a typical clock drift on a relatively fast Internet connection (mine was 105 Mbps, and both machines are located in California); the average clock drift was (77 + 81 + 140) / 3 ≈ 99.3ms. The New York Stock Exchange (NYSE) can process over 320,000 orders per second; so in that slim time window of 99.3ms, the NYSE can process over 31,776 orders! That’s why the NYSE offers co-location services to other companies to host their trading systems in its data centers to minimize network latency.

The morale of the story is: Get to know your clocks; every nanosecond counts! Read more about clocks in Section 2.4 of Computing with Data: