Eliminating Single Point of Failure Problems
At Webapper, we’re somewhat obsessed with resiliency. Especially these days, we’re looking for ways to help our clients and our own business become antifragile. It’s one of the reasons we became a cloud services provider many years ago. The cloud makes eliminating single point of failure problems easier. And that idea is a strong focus in building resiliency at almost any level of any system or organization.
What Is A Single Point of Failure (SPOF)?
A single point of failure (SPOF) is a potential risk posed by a flaw in the design, implementation of a system where one failure can stop the entire system. Implementing redundant components and replicating critical parts of the system eliminates SPOFs when designing a reliable system.
Single Point of Failure – Examples & Remedies
Don’t you hate it when the power goes out at home and your digital clocks all flash 12:00? Even worse, don’t you hate it when your power goes out and your computer turns off in the middle of your work? Electricity in your home is a SPOF…unless you have an auxiliary power source like a generator. For your computer, you should have a UPS, of course, to keep it running smoothly when the lights go out.
Hard disk failure
Nothing stops a day quite as abruptly as a hard disk failure. You wake up, pour some coffee, and turn on the MacBook… And there it is, a flashing folder with a question mark (or a blue screen on a PC). Not good, not fun, not a pleasant start. Disk repair utilities? Backups? These questions need swift (and RIGHT) answers. Time Machine or Acronis to the rescue, and perhaps a quick order from Amazon…
Something we see all too often is the lone wolf developer. That one guy who built the software is a SPOF. He has all the knowledge. He has all the answers. He gets mad, he leaves, and everything stops. Passwords? Bug fixes? Design schemes? No one else knows. If that developer documented his work, a rarity, you may be able to move along to another developer (be prepared to hear “he was an idiot!”). If that developer had an apprentice, you may be able to “promote” that person into the senior role, but most likely there will be a skill level difference. Your best chance for continuity, however, is a development team (perhaps even an outsourced provider). Shared application experience, a common framework, and a known project plan certainly help absorb the shock.
Note: We have been brought into projects where the solo developer runs the show. It’s challenging to integrate other resources when there’s one person trying to help us get involved while still keeping the systems running. And more often than not, we see more dangers than we expected (special undocumented processes for builds, out of sync production & testing environments, “I forgot”…).
Like the hard disk crash, a server failing creates an unpleasant start to a day if it’s a SPOF. Having load balanced servers and failover servers is a far better network strategy, eliminating SPOFs and all but guaranteeing uptime. Mission critical? Redundancy is your friend!
We build scalable, self-healing systems, which eliminates the 2:00am freak-out when a server goes down.
Perhaps the biggest lesson of COVID-19 is that systems at large can fail too. No website for your restaurant? SPOF. An in-person only business model? SPOF. A single revenue stream? SPOF. Larger systems (like your business) need resiliency too, such as a good contingency plan for your staff, your revenue, and your operating practices. Keep an eye on the horizon. See if there are ways to become anti-fragile, so you thrive instead of dive in times of adversity.