Unix
-
Day 14: The Bug That Didn't End the World, and the One That Still Might
On December 31, 1999, a measurable percentage of the developed world stockpiled bottled water, withdrew cash from ATMs, and stayed up to see if the lights would go out at midnight.
They didn’t. Planes did not fall from the sky. Power grids did not collapse. Bank balances did not reset. The new millennium arrived, the champagne was opened, and by January 3rd everyone agreed it had been a hoax.
It was not a hoax. It was a save.
Y2K was a global $300+ billion engineering effort spread across roughly five years and almost every government and Fortune 500 IT department on Earth. The reason nothing happened on January 1, 2000 is that for half a decade, an enormous number of people worked very hard so that nothing would happen. The bug was real. The fix worked. Most people forgot it was ever a problem.
Twelve years from now, a structurally identical bug detonates again, but first let’s understand what happened in ‘99.
Y2K: the bug
The Y2K bug is almost embarrassingly simple. From the 1960s through the 1980s, computer storage was expensive enough that programmers had a habit of representing years with two digits,
99instead of1999,73instead of1973. It saved two bytes per date. Across a payroll system tracking millions of employees, that mattered.The assumption baked into that decision was: we’ll have rewritten this system long before the century rolls over.
This is the most consistently wrong assumption in software engineering. Code outlives its authors’ confidence. By the late 1990s, vast amounts of critical infrastructure, bank ledgers, airline reservation systems, hospital records, utility billing, social security disbursement, military logistics, nuclear plant monitoring, were running on COBOL programs from the 60s and 70s that had been patched but never rewritten. The language is unfamiliar to many but the fix it later approach is relable to everyone. The developers at the time all quietly assumed that the year
99was less than the year00.When the rollover hit,
99-12-31 + 1 day = 00-01-01looked, mathematically, like jumping back to 1900. Interest calculations would compute negative ages. Pensioners would suddenly be billed for a century of debt. Reservation systems would mark every upcoming flight as having departed in the past. Insurance policies would expire en masse.The reason planes did not fall is that, starting roughly in 1995, every major airline, manufacturer, FAA system, and air traffic controller began an exhaustive audit-and-fix campaign.
The reason the power grid did not collapse is that every utility company in North America and Europe ran the same campaign on their SCADA systems.
The reason your bank balance was still correct on January 1, 2000 is that someone, somewhere, spent late nights in 1997 reading printouts of code written before they were born.
The estimated total global cost: $300 to $600 billion. The amount of measurable damage on January 1, 2000: small enough that people argued for the next decade about whether the spend had been justified.
It was. The bug was real. The fix worked. The result of a successful preventive engineering campaign is that it looks, in retrospect, like the problem was never there.
Y2038: the same bug, different number
Twelve years from now, specifically, January 19, 2038, at 03:14:07 UTC, a structurally identical bug fires for a different reason.
Unix time is stored, on a huge amount of legacy infrastructure, as a signed 32-bit integer. That gives you about 2.1 billion seconds of positive range from the 1970 epoch. 2.1 billion seconds is 68 years. 1970 + 68 = 2038.
At
03:14:07 UTCon that date, the counter hits its maximum value,2,147,483,647. The next tick overflows. In two’s-complement signed integer arithmetic, the value rolls over to its most negative possible value:-2,147,483,648. Interpreted as a Unix timestamp, that’s December 13, 1901.Every 32-bit Unix-derived system that hasn’t been patched will, in the span of one tick, conclude that it is now the early 20th century. The effects are the same family of effects as Y2K, but applied to a much wider deployment surface.
File modification times become nonsensical. SSL certificates appear expired, or worse, not-yet-valid. NTP synchronization fails. Filesystems with 32-bit inode timestamps lose ordering. Embedded device firmware that schedules tasks based on wall-clock time begins executing at random intervals. Industrial control systems that latch state machines on “time since last event” calculations latch on negative durations and either freeze or behave unpredictably.
Modern desktop and server operating systems are mostly fine. Linux finished migrating to 64-bit
time_ton all architectures by kernel 5.6 (2020) and glibc 2.32. macOS and Windows have been 64-bit-clean for over a decade. AWS, GCP, and Azure all run 64-bit kernels.The problem is not where you are reading this. The problem is in the physical world that keeps everything running.
The long tail is enormous
Estimates of the number of currently deployed 32-bit embedded devices that interact with
time_tin some way range from a few hundred million to several billion.Industrial controllers, automotive ECUs, network routers, smart-meter firmware, point-of-sale terminals, medical imaging devices, GPS units, cable boxes, elevator controllers, traffic light systems, ATM internals, payment terminals, building HVAC, water-treatment SCADA, satellite firmware, oil rig control systems, and the embedded computer in your refrigerator.
Each one, depending on vintage and vendor, may or may not have been patched.
Many of these devices are not internet-connected and cannot be patched remotely. Many are running firmware whose source code has been lost. Many are running firmware whose vendor no longer exists. Many are in places where physical access is hard, a deep-sea oil platform, a satellite in geostationary orbit, a controller welded inside an industrial machine.
The Y2K fix worked because the affected systems were largely centralized: mainframes in data centers, software at named companies, code with active maintainers. You could audit it. You could rewrite it. You could ship a patch.
Y2038 is decentralized. The affected systems are everywhere.
The Buff Must Flow
In 2022, Microsoft Exchange Server stopped delivering email worldwide. The cause was a 32-bit signed integer in Exchange’s anti-malware scanner that stored the date as a long-form number. On New Year’s Day, the value tipped over the limit and the scanner refused to load. Mail queues backed up everywhere. Microsoft shipped an emergency script the next day. They called it Y2K22.
On April 6, 2019, the GPS week number counter rolled over. The failure mode was familiar, an integer designed when the engineers thought it was going to be big enough turned out, decades later, not to be. NYC’s municipal wireless network went down. KLM grounded a flight. Older car and marine GPS units showed dates in 1999.
Two examples of overflows hitting production and breaking real things. Y2038 will be every one of those at once, in places nobody is thinking about.
Y2038 is foreseeable. We know about it. We know what needs to be done. We have twelve years. We should get started sooner rather than later. A lot of important systems need to be replaced, and the fewer that fall through the cracks, the better.
There’s no checklist for the devices we’ve already forgotten about, but maybe there should be.
Tomorrow: The Smear, how Google, Amazon, and Meta quietly decided to stop telling the truth about leap seconds, and why everyone else followed.
Sources
- Year 2000 problem — Wikipedia
- Year 2038 problem — Wikipedia
- Microsoft Exchange year 2022 bug in FIP-FS breaks email delivery — BleepingComputer
- Microsoft Exchange Fixes Disruptive ‘Y2K22’ Bug — BankInfoSecurity
- GPS week number rollover — Wikipedia
- GPS Week Number Rollover — GPS.gov
- The impact and resolution of the GPS week number rollover of April 2019 — Geoscientific Instrumentation (Copernicus)
- Linux kernel 5.6 — 64-bit time_t support for 32-bit architectures (KernelNewbies)
- The Open Group Base Specifications: time.h
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].
-
Day 13: Unix Time, 1,780,620,532
That’s roughly what time it is, right now, as I type this.
Not 8:48 PM. Not “Thursday.” Not “June 4th, 2026.” None of those are what your computer thinks “now” is. To your laptop, your phone, your car’s infotainment system, the streaming server pushing this page to your browser, and the ATM in the corner store, now is a number. A 10-digit integer. Counting up, one tick per second, since a fixed moment in 1970.
That number runs the world. It’s the closest thing the global computing infrastructure has to a heartbeat. And it has some weird properties, almost none of which are explained by the name it goes by.
Unix time.
The clock under every clock
Open a terminal. Type
date +%s. You’ll see something like1780620532come back. That’s Unix time. Seconds since the Unix epoch,1970-01-01T00:00:00 UTC.Every modern operating system tracks time this way internally, even if it dresses up the output for you. The pretty “8:48 PM” on your menu bar is a calculation: take the current Unix timestamp, apply your timezone offset, run it through the calendar rules, format it for display. The underlying number is just
1,780,620,532-and-change, counting up.JavaScript’s
Date.now()? Unix time in milliseconds. Java’sSystem.currentTimeMillis()? Unix time in milliseconds. Python’stime.time()? Unix time as a float. Go’stime.Now().Unix()? Unix time. PostgreSQL’sEXTRACT(epoch FROM ...)? Unix time. SQLite’sstrftime('%s', 'now')? Unix time.It’s the lingua franca of computing. Two systems written in different languages, on different continents, with different calendars in their UIs, agree about what now means because they both agree about this one number.
Why 1970?
The honest answer is: convenience.
In the early 1970s, Ken Thompson and Dennis Ritchie were building Unix at Bell Labs. They needed a way to represent time on a 32-bit machine. Their first attempt counted 1/60 of a second per tick in a 32-bit integer, and overflowed in about two and a half years. So they switched to 1 tick per second, which gave them roughly 136 years of range in a signed 32-bit integer.
Then they needed a zero. They picked
1970-01-01because:- It was recent enough that the historical calendar mess (Julian vs. Gregorian, the dropped days in 1582, the year that started in March) was someone else’s problem.
- It was round.
- It predated every Unix system anyone cared to represent.
- It was conveniently close to UTC’s formalization a couple of years later.
That’s it. There’s no cosmological significance to 1970-01-01. It’s not aligned with any astronomical event. It’s the timestamp equivalent of
git init. We’ll start counting from here, and we’ll figure the rest out later.The “later” turned out to mean everywhere.
The thing that isn’t there: leap seconds
The computer’s time problem mostly comes from UTC.
Unix time is defined as the number of seconds since the Unix epoch. You might reasonably assume that if I have two timestamps, the difference between them is the actual number of physical seconds that elapsed between those two moments.
It is not.
Unix time does not count leap seconds. Since 1972, the IERS has inserted 27 leap seconds into UTC, extra seconds added to keep civil time aligned with Earth’s slowing rotation. Unix time pretends they never happened. The Unix clock has, over its 56-year lifetime, “lost” almost half a minute relative to reality.
Even weirder: during the actual leap second, when UTC ticks
23:59:59 → 23:59:60 → 00:00:00, Unix time has to do something. POSIX doesn’t specify what. So implementations have invented three different answers:- Repeat the second. The clock shows
23:59:59for two real seconds and then jumps to00:00:00. Two distinct physical moments share the same timestamp. File mtimes can collide, log entries can appear out of order. - Insert the second. The clock briefly shows
23:59:60, which is a valid UTC string but breaks every parser that assumes seconds run 00–59. Linux kernels do this. Hilarity ensues at midnight. - Smear it. Don’t insert the second at all. Slow every clock down by a tiny fraction over a 24-hour window so it absorbs the missing second smoothly. Google does this. Amazon does it. Facebook does it.
So “Unix time” in 2026 means three subtly different things depending on whether your server is running stock Linux, smeared Google time, or one of the dozens of variants in between. Two timestamps from two providers may disagree by a second, and both are correct under their own definitions.
That’s what the spec authors call “implementation-defined behavior” and what the rest of us call “why distributed-system logs don’t line up.”
The number is also a string
Integers are easy for computers but humans expect a string. Unix time is the easiest timestamp format to compare, sort, and store because it’s an integer, but as soon as we convert to human-readable format, all that changes.
To find out which one is earlier, subtract. To sort a million events, sort the integers. To store one efficiently, write 8 bytes. To send one over the network, send 8 bytes.
Compare this to a full ISO 8601 timestamp like
2026-06-04T16:47:23.512847+00:00. That’s a 32-character string that needs to be parsed, validated, normalized for timezone, and converted to a comparable representation before you can do anything with it. Every comparison is a parsing pass. Every storage is 4× the bytes. Every sort is a string sort with calendar rules.Unix time is fast. It’s so fast that even formats designed to replace it (Google’s Spanner, AWS’s KSUIDs, Twitter’s Snowflake) embed Unix-like millisecond counts at their core and just append entropy bytes around them.
The ubiquity isn’t an accident. It’s the natural result of picking the representation that’s cheapest at every step.
The Untimes
Unix time is a convention that has eaten the world.
It’s anchored to UTC, which means it inherits UTC’s quirks. It’s embedded controllers in cars, industrial equipment, network gear, satellite firmware, gas pumps, so pretty much every piece of modern infrastructure.
1,780,620,532is just a number, a timestamp. It’s used by your bank for transactions, used by your file system for its files, but also it’s a hack. A 56-year-old dart in the board of of time, that ignores leap seconds, depends on UTC, has three different definitions during the same physical second, and we built the entire internet on top of it.Tomorrow will be on what happens when the bill comes due. Y2K and Y2038, the bug that didn’t end the world, and the bug that still might.
Sources
- Unix time — Wikipedia
- Leap second — Wikipedia
- Coordinated Universal Time — Wikipedia
- International Earth Rotation and Reference Systems Service — Wikipedia
- Leap Smear — Google Developers
- Look Before You Leap — The Coming Leap Second and AWS
- It’s time to leave the leap second in the past — Engineering at Meta
- How Precision Time Protocol handles leap seconds — Engineering at Meta
- Leap second bug cripples Linux servers at airlines, Reddit, LinkedIn — The Register
- Resolve Leap Second Issues in Red Hat Enterprise Linux
- History of Unix — Wikipedia
- Snowflake ID — Wikipedia
- ksuid — segmentio (GitHub)
I’d appreciate a follow. You can subscribe with your email below. The emails go out once a week, or you can find me on Mastodon at @[email protected].