Google failure: Google doesn’t stage changes????
Like many people we were surprised early Saturday morning when Google decided to display a warning for EVERY search result. My wife called immediately, thinking that our kitchen computer had been infected by some crazy virus.
According to Google’s own blog, the problem was with a file that lists all the “dangerous” web sites on the Internet. Apparently one of the entries in the file was simply “/” which was interpreted as “every site” – so Google flagged every single site on the Internet (including it’s own sites like youtube.com) as containing malware.
Now, what’s crazy about this is the fact that it happened at all. Most web developers we know, and certainly every site InGenius manages for customers goes through a pretty rigorous process before being updated. We maintain:
- Development Systems,
- Staging Systems,
- The Live site.
Developers work on the development systems – making changes, and testing their work. Once they’re happy with it, the changes are uploaded to a Staging server, and the whole site is tested, paying particular attention to the new features. Only then, when everything is perfect, are the changes migrated over to the Live Site. And, everything undergoes an immediate smoke test, and a complete test as quickly as possible.
So, it appears that Google doesn’t follow this sort of simple process – that they don’t stage changes, but instead simply roll them out to their main server!!! Unbelievable!!!!
I’m sure some changes will be made to their processes:
- An automated check of the new malware file to look for entries that can cause a repeat of the problems they experienced,
- A staged process, where the changes are tested before they are rolled out to the live system,
- A search for other “single points of failure” in Google’s systems. Where else could similar issues bring down all of Google??!!!
It sure would be fun to sit in on some of the meetings I’m sure will occur at Google on Monday morning. Hopefully whoever is responsible is not fired – you can be sure they won’t be making a mistake like this again…