The real cost of recovery

There are few things more frustrating than being let down by the very technology in which we’ve invested to make our lives and our jobs easier.

Financial services, telecommunications, manufacturing and energy lead the list of industries with a high ‘cost of recovery’ and ‘repetitional damage’ for every minute of IT downtime. This ‘cost of recovery’ includes lost business with customers (both short term and long term), employee time diverted from other tasks to get the IT systems running again, employee overtime expenses (if applicable), the value of any lost data, emergency maintenance fees (particularly if the outage occurs during off hours) and additional repair costs that may go on even after service has been restored.

I’ve been working with a number of clients to attempt to create a reliable calculation model for the real cost of recovery during outages and downtime. However, even the closest estimate will never be 100% accurate. There are too many ‘ripples’ that follow an outage and affect the entire business in the long run.

The historical measure

The efficacy of outage response protocols in financial institutions historically has been measured by the following KPIs:

  • Outage period
  • Outage frequency
  • Functions and departments affected
  • Repair labor costs
  • Overtime costs
  • Sales revenue lost

And the simplest way to calculate potential revenue losses during an outage is with the equation:



  • GR = gross annual revenue
  • TH = total annual business hours
  • I = impact percentage
  • H = number of hours of outage

Of the numbers above, you’ll agree that ‘impact percentage’ is the most vague and nebulous. After all, impact percentage has to be measured across the business, but it also should reflect the lifetime value of customers who permanently defect to a competitor. This makes it a notoriously difficult measurement to capture.

Also to be considered for bigger organizations are damages that stem from the inability to deliver timely service (e.g. lawsuits due to collateral damages, or lost fees on large corporate transactions.) Furthermore, highly publicized situations also can impact reputation and shareholder value.

Building a better benchmark

Better visibility across the business helps to focus efforts following an outage. As a result, you have just six people working collaboratively on a solution, not 24 people running around and possibly duplicating one another’s efforts.

Benchmarking outage response KPIs through a consolidated system will bring you closer to an accurate representation of your true cost of recovery. More importantly, it will give you a means to measure the efficacy of your response protocols and pick up outage trends.

The new benchmark arose from work with the Royal Bank of Scotland, which was looking for help managing regulatory compliance and reducing the risk of service failures. My team worked hand-in-hand with the bank to implement a solution – even spending six months in a sandbox environment to make sure we found something that suited all of its needs.

I’d love to discuss this further. Send me a message on LinkedIn or email.

Brian Retzlaff, Executive Consultant, Financial Services at ServiceNow