What we can learn from the Y2K bug

05 February 2017

/102

The Guardian's Chips with Everything podcast recently published an episode about the Y2K bug, which I recommend you listen to if you write software.

The Y2K bug came about due to the small amount of storage space that computers traditionally had which meant that, for optimisation, years were presented as 2 digits (80 for 1980, or 99 for 1999). The problem was when that ticked over to 2000 – the year would be presented as 00, and sorting algorithms would look at that date and say "Ah, that's 1900 which is in the past".

The podcast reveals that the first effects of Y2K were felt years prior to the turn of the century. For example, in 1998, supermarket chain Marks & Spencer were sent a batch of corned beef with the barcode setting the expiry date as 2002. Marks & Spencer's system interpreted this as 02, which translated to 1902 - a date 96 years in the past. Therefore, the system said that this batch of corned beef was out of date, despite not expiring for another four years. The supermarket sent back the corned beef and ordered another batch, which was sent back again. This repeated four times before the bug in the system was found to be the culprit.

I really enjoyed this episode because it revealed that, although the Y2K bug turned out to be a non-event, that wasn't because it was some sort of myth. Software developers put an incredible amount of work in the years leading up to the year 2000 to prevent computer systems from failing. One of these people was Professor Martyn Thomas CBE, who was involved in fixing the bugs in computer programs that would have caused problems when 99 ticked over to 00. He was interviewed in the podcast.

What's particularly interesting is when he talks about how many similar problems are in computer programs today, in particular, around cybersecurity:

I'm much more worried about cybersecurity, I think that is a threat that is not yet being addressed strategically. We have to fix it at the root which is by making the software far less vulnerable to cyber attack. And that certainly can be done, but we don't appear to be setting about any kind of strategy nationally or internationally that would bring that about.
Martyn Thomas on Chips with Everything by The Guardian
Cascade failure: an inside look at the Y2K bug (14:48-15:22)

This was really encouraging to hear, because it's what Snyk, the company I work for, is doing by making developers aware of vulnerabilities in the open source code that they rely on, and giving them an easy way to fix it. I think cybersecurity is going to be an increasinly important topic over the coming years, and that we'll see more vulnerabilities being exploited in the systems we rely on. It's something I've become more aware of, too – the code we use and reuse has so many vulnerabilities, and open source software, by its nature, duplicates those vulnerabilities across all the systems that rely on it. But if we know about them and can fix them quickly, we're making it harder for hackers.

This relates to another thing that Martyn talks about:

But we still build software on the assumption that you can test it to show that it's fit for purpose… software is vastly complex, and a typical programmer makes a mistake in, if they're good, every 30 lines of program. If they're very, very good, they may make a mistake in every 100 lines of program. If they're typical, it's in about 10 lines of code.

And you don't find all of those by testing… when a modern car has got a hundred million lines of software, you know, even an error in one in a hundred lines is still a million software errors in your car. And a lot of those are vulnerabilities that can be exploited as cybersecurity problems, for example.
Martyn Thomas on Chips with Everything by The Guardian
Cascade failure: an inside look at the Y2K bug (18:20-19:28)

Our tests can only tell us about the scenarios we are testing for. Code is buggy, and the more code we write, the more bugs we introduce. I think things like pair programming and writing good tests can help improve quality vastly, but we also need to make it quick and easy to implement and roll out fixes when we find errors. In particular, we need to take security more seriously when writing software, especially for programs that are relied on so widely.

There are still potential things that could go wrong that could cause very widepsread devestation. The number of services and computer systems that are dependant on GPS, for example, is unbelivebly wide… and yet, you're dealing with a signal that you can buy a jammer for on the internet for $100, and where, if you bought a reasonably powerful jammer for $1000 and stuck it in a balloon and floated it over London, you'd take out GPS over the whole of the South of England. The emergency services use GPS… ships use it for navigation… and the dependance across the whole of society on that easily jammed and, actually increasingly easily spoofed signal is astounding.
Martyn Thomas on Chips with Everything by The Guardian
Cascade failure: an inside look at the Y2K bug (19:42-21:11)

Martyn recommends this situation be improved by introducing strict liability policies so that companies that write bad software can be sued for damages, to encourage them to improve standards and take security more seriously.

Anyone who has watched Mr Robot will be aware of how big an impact one cyberattack can have.