EspiritoEspirito Logo

Robust Systems: The Key to Resilient Software

Robust Systems: The Key to Resilient Software

We recently wrote about why spending the time to write resilient code more actually be an optimisation. But when you’re building resilient software, it’s more than just about the code you write. Your entire development system needs to be built for resilience. So moving beyond the code, let’s take a look at what a system built for resilience looks like.

Knowing When Things Go Wrong

When you start shipping software, users are often quick to find bugs. Combined with external integrations changing over time, a ‘set-and-forget’ approach to software doesn’t usually work.

​The first thing users will do is use it, but the second thing they’ll do is break it.

This is where observability comes in. You need a reliable way to tell when things go wrong. The most common way of implementing visibility is through the MELT framework: metrics, events, logs and traces. Without those, you’ll be left solely relying on users to report issues, which is often not quick nor reliable. You might be left for days or weeks not knowing you have a broken application.

Effective Triage

A single bug surrounded by check marks.
Figuring out what to fix first isn’t always easy. That’s where a prioritisation system comes in. (Image: Andrii Yalanskyi / Shutterstock.com)

Now you know when issues occur, the next step is to work towards a solution. But you shouldn’t jump ahead to it just yet: building a resilient application is about focusing on the right things at the right time. This usually comes as a classification (or triage) system. The key is: even if an issue has a quick fix, it’s not always worth spending time on it just yet.

Prioritisation is one of the keys to resilience: a UI glitch isn’t great, but that doesn’t matter if nobody can actually load your site.

Depending on the size of your application, prioritisation might look a bit different. Smaller projects might categorise as critical or non-critical, letting you move without overhead, but larger teams might expand to ~4/5 tiers to classify, or even introduce multiple dimensions (like estimated work required, brand priority, etc) to help classify more effectively.

Allocation and Swift Resolution

Once you’ve found and prioritised your issue, now it’s time to fix it. It’s critical to have the right tools for the job, otherwise resolution can be daunting. So many sure you have the correct hardware and software to diagnose, resolve and verify resolution of the issue.

Next, you’ll then need to allocate the task. If you’re a one-person-band, that’s easy — it’s you — but teams need a way to distribute work so resolution is quick. While allocation is ultimately up to you, there are a couple things to consider:

  • Right type of developer: this sounds obvious, but for smaller teams where roles are blurred, allocating a task to the right developer might not be so straightforward.
  • Familiarity with the code: developers who have recently worked near the problem likely have a better understanding of it, can find issues quickly and resolve them more effectively.
  • Existing workload: if a developer is already working on high-severity issues, they might not be the best person for this one. Alternatively, if their backlog of lower-severity bugs stagnents, they might need some breathing room.

Incident Response

A critical error on a computer screen, requiring incident response protocols.
It’s almost inevitable that, at some point, something will go wrong. It’s how you handle it that matters most. (Image: DC Studio / Shutterstock.com)

Sometimes, bugs are more than just inconveniences. When a problem escalates to tangible or wide-reaching damage, it’s time for your incident response protocols.

Ask yourself: do you have:

  • Database backups?
  • A previous, known-good version of your application to deploy in the meantime?
  • A reserve cloud provider when your experiences an outage?
  • Graceful failures, or can one failure bring everything offline?
  • Someone on-hand to perform manual actions to stop an attack?

Responding quickly, decisively — and correctly — are key. Users will generally prefer to use an older, or more limited version of your software than none at all.

​Whether it’s your fault or not, it’s your responsibility to find a solution.

Proactive, Continual Improvement

Being reactive is a necessary ability, but it doesn’t replace proactivity. Systems become resilient from the effort poured into them. And for that you need high risk awareness and to be proactive about solutions.

You shouldn’t be waiting until things go wrong to fix them. Instead, spending regular time on testing and maintenance will help you find and fix problems before they impact users. That is why at FONSEKA, we include maintenance and support as a core part of our service. Your first priority should be effective error handling and providing fallbacks for failures — a system that says it’s broken is better than one that just breaks.

A crucial aspect of resilient software comes into play when fixing issues — because you don’t want a bug to reappear. Spend a bit more time setting up some tests that, if they were present earlier, would have caught the issue. Regressions have been estimated to make up as much as 45% of software defects, so limiting your number of them could make a big difference.

Conclusion: Resilient Systems Take Work

Building resilient software is all about strategy — and that takes work to setup and time to maintain. In modern application development, you must retain trust with customers, be able to respond quickly when things go wrong and ultimately provide them with a great service.

Standards like ISO 22301 and other business continuity tools can help you find gaps in your workflow and also the workflow of those who use your software — because at the end of the day, you’re the service your customers’ rely on.

At FONSEKA, we understand the importance of keeping your platform operational, secure and working as expected. That’s why we partner with business for the long-term, so what we build serves you in the long-term. If you’re looking for a resilient, sustainable long-term software solution, get in touch with us today!

Post Details

Author: Lachlan Rehder

Categories:

Updated: 03 Apr 2026

Interested in one of our products?

Get in touch and let us know how we can help! 😇