Business continuity: Why do you need it?
The course is part of this learning path
Impact of business disruption
Every business is at risk of disruption from a variety of threats, such as power loss, fire, flood, or loss of staff.
This could lead to:
- Failure of your business.
- Loss of reputation or customers.
- Financial, legal, and regulatory penalties.
- Human resource issues.
- An impact on insurance premiums.
Did you know?
80% of businesses without a business continuity plan that are hit by a major incident either never re-open or close within 18 months. The diagram below shows the critical timeline for a disruptive incident and illustrates the relationship between three important concepts:
- Maximum Tolerable Period of Disruption
- Recovery Time Objective
- Recovery Point Objective
Let’s look at each of these in more detail.
Figure 1: Disaster recovery timeline
Maximum Tolerable Period of Disruption
The Maximum Tolerable Period of Disruption, shortened to MTPD or MTPoD, is also known as the Maximum Tolerable Downtime, or MTD, and is a fundamental measurement in business continuity planning. It defines the maximum amount of time that an organisation can afford to be affected by an incident.
This downtime could be very short for organisations that provide a real-time service, such as banks that provide 24-hour online services. In April 2019, Which? found that six of the UK’s biggest banks had at least one failure every two weeks.
Other services can be unavailable for much longer periods of time without major impact, but there will come a point where the disruption can’t be tolerated.
In October 2021, a ‘configuration error’ brought down Facebook, Instagram and WhatsApp for nearly six hours. This caused disruption to other sites like Twitter, due to the surge of new visits to their apps. Forbes estimates that Facebook lost $66m (£48.5m), during this six-hour outage.
Recovery Time Objective
Many organisations can survive some disruption to services, providing they reach a certain level of performance or availability. Once this initial level is reached, the organisation can gradually increase their service offering to how it was before the incident.
Recovery Time Objective or RTO is defined by ISO 22301 as ‘the period of time following an incident within which a product or service or an activity must be resumed, or resources must be recovered’. In other words, how long your business can survive before operations must be restored to normal.
To give an example: say one of your servers goes down at noon, if you have a five-hour RTO, this means that the server and its application needs to be back up and running by 5pm. This includes replacing the server, repairing the server, and restoring its data. However, organisations and their customers will often accept a lower level of performance if they can achieve a reasonable level of service. An example of this might be after a train incident where some rail traffic is recovered within a few hours, but it’s accepted that it might take a few days to have the line back to normal levels of service.
The ISO standard states that for products, services and activities, the Recovery Time Objective must be less than the time it would take for the impacts that would arise (because of not providing a product/service or performing an activity) to become unacceptable. In simple terms, this means that whatever the Recovery Time Objective, it must always be less than the Maximum Tolerable Period of Disruption.
Recovery Point Objective
The RPO is a time-based measurement of the maximum amount of data loss that is tolerable to your organisation.
This is defined by ISO 22301 as ‘the point to which information used by an activity must be restored to enable the activity to operate on resumption’.
To go back to our server outage at noon, if you have a one-hour RPO, this means you have accepted that you can lose the data between 11 am and noon; one hour’s worth of data. If you have a four-hour RPO, you’ve said you can lose four hours’ worth of data.
The RPO usually refers to the point at which the last backup activity was carried out, before the disaster occurred. The BCI’s Good Practice Guide indicates that the Recovery Point Objective should be less than the Maximum Tolerable Data Loss.
NIST Cybersecurity Framework
Before you move on, it’s worth noting that there are other approaches that will help you access the impact of disruptions. The Five Functions by NIST is one approach available that aids organisations in expressing their management of cybersecurity risks at a high level.
Figure 2: The NIST cybersecurity framework
1. The Identify function assists in developing an organisational understanding to managing cybersecurity risk to systems, people, assets, data, and capabilities. Understanding the business context, the resources that support critical functions, and the related cybersecurity risks, enables an organisation to focus and prioritise its efforts, consistent with its risk management strategy and business needs. Examples of this function include:
- Identifying physical and software assets within the organisation.
- Identifying the Business Environment, the organisation supports including the organisation's role in the supply chain, and the organisation’s place in the critical national infrastructure sector.
- Identifying cybersecurity policies established within the organisation to define the Governance program as well as identifying legal and regulatory requirements regarding the cybersecurity capabilities of the organisation.
- Identifying asset vulnerabilities, threats to internal and external organisational resources, and risk response activities as a basis for the organisation’s Risk Assessment.
- Identifying a Risk Management Strategy for the organisation including establishing risk tolerances.
- Identifying a Supply Chain Risk Management strategy including priorities, constraints, risk tolerances, and assumptions used to support risk decisions associated with managing supply chain risks.
2. The Protect function outlines appropriate safeguards to ensure delivery of critical infrastructure services. The Protect Function supports the ability to limit or contain the impact of a potential cybersecurity event. Examples of this function include:
- Protections for identity management and access control within the organisation including physical and remote access.
- Empowering staff within the organisation through awareness and training including role based and privileged user training.
- Establishing data security protection consistent with the organisation’s risk strategy to protect the confidentiality, integrity, and availability of information.
- Implementing information protection processes and procedures to maintain and manage the protections of information systems and assets.
- Protecting organisational resources through maintenance, including remote maintenance activities.
- Managing protective technology to ensure the security and resilience of systems and assists are consistent with organisational policies, procedures, and agreements.
3. The Detect function defines the appropriate activities to identify the occurrence of a cybersecurity event. The Detect function enables timely discovery of cybersecurity events. Examples of this function include:
- Ensuring anomalies and events are detected, and their potential impact is understood.
- Implementing continuous security monitoring capabilities to monitor cybersecurity events and verify the effectiveness of protective measures including network and physical activities.
- Maintaining detection processes to provide awareness of anomalous events.
4. The Respond function includes appropriate activities to act regarding a detected cybersecurity incident. The Respond function supports the ability to contain the impact of a potential cybersecurity incident. Examples of this function include:
- Ensuring response planning processes are executed during and after an incident.
- Managing communications during and after an event with stakeholders, law enforcement, external stakeholders as appropriate.
- Analysis is conducted to ensure effective response and support recovery activities including forensic analysis, and determining the impact of incidents.
- Mitigation activities are performed to prevent expansion of an event and to resolve the incident.
- The organisation implements improvements by incorporating lessons learned from current and previous detection/response activities.
5. The Recover function identifies appropriate activities to maintain plans for resilience and to restore any capabilities or services that were impaired due to a cybersecurity incident. The Recover function supports timely recovery to normal operations to reduce the impact from a cybersecurity incident. Examples of this function include:
- Ensuring the organisation implements recovery planning processes and procedures to restore systems and/or assets affected by cybersecurity incidents.
- Implementing improvements based on lessons learned and reviews of existing strategies.
- Internal and external communications are coordinated during and following the recovery from a cybersecurity incident.
Relationship with risk management
As you saw earlier, business continuity management and risk management are closely interlinked.
Thinking ‘it’ll never happen to us’ may not come cheap for your organisation, so you’ll want to start by assessing the threats or risks that could lead to a significant interruption to your operations.
Figure 3: Business continuity and risk management
As illustrated above, the risk management process begins with the identification of the business-critical assets, which might be people, premises, system, products, services, activities, or information.
The next step reintroduces the business continuity measures explored earlier, starting with business impact analysis to consider the loss or destruction of those assets. This will provide:
- The Recovery Time Objective
- The Maximum Tolerable Period of Disruption
Remember: The Recovery Time Objective must always be less than the Maximum Tolerable Period of Disruption.
After this, the Continuity Requirements Analysis is conducted for each asset, which yields:
- The Recovery Point Objective
- The Maximum Tolerable Data Loss
Remember: The Recovery Point Objective must always be less than the Maximum Tolerable Data Loss.
Once these steps are complete, it should be possible to conduct a standard risk assessment to plot the increasing impact against the increasing likelihood of an incident occurring. For example, you may consider loss of IT systems as high impact, low likelihood.
This will deliver a prioritised list of risks that can be examined for the most appropriate options, and risk treatment recommendations can be made, i.e., to avoid, accept, tolerate, reduce, or transfer the risk.
Note: The most appropriate standard for information security risk management is ISO/IEC 27005, Information Security Risk Management.
Your BCM policy should feed into all areas of the risk management lifecycle, helping you determine which of the organisation’s assets should be considered when conducting the business impact analysis, which then determines the optimum BCM strategy.
In turn, this feeds into the development and implementation of the BCM response and results in the final stage of exercising, maintaining, and reviewing the business continuity structure.
To be successful, BCM has to become part of the culture of your organisation. This will challenge staff to think more deeply about their role in the organisation and how they can help make it more resilient. This can be achieved through a combination of awareness raising and training.
Some organisations treat BCM as a distinct project because many of the steps follow the same principles as standard project management. This is what you’ll see in the next Course as you’re taken through the implementation process.
This Course will begin by helping you understand the difference between business continuity and disaster recovery. From here, there will be more focus on the value that business continuity brings, before you are introduced to the critical steps for implementing your own plan. Finally, you’ll learn how to assess the impact of business disruption, including how long it can be tolerated for.
A world-leading tech and digital skills organization, we help many of the world’s leading companies to build their tech and digital capabilities via our range of world-class training courses, reskilling bootcamps, work-based learning programs, and apprenticeships. We also create bespoke solutions, blending elements to meet specific client needs.