Business Continuity Management
Module 8 - Business Continuity and Disaster Recovery
Business continuity management and disaster recovery are about an organization being prepared for business disruption and taking the necessary actions to get the business operational as soon as possible after an incident occurs. This course provides a strong foundation in each area by looking at what business continuity management is, why it’s important and how it can be implemented within the overall risk management process, before reviewing the disaster recovery process.
The objectives of this course are to provide you with and understanding of:
- The value of business continuity management to an organization
- The business continuity management process
- The impact of business disruption on an organization and how long disruption should be tolerated
- The business continuity implementation process and implementation planning
- Disaster recovery strategy and the importance of disaster recovery planning
- Different standby systems and how these relate to recovery time
- The importance of robust documentation and testing of the plan
This course is ideal for members of information security management teams, IT managers, security and systems managers, information asset owners and employees with legal compliance responsibilities. It acts as a foundation for more advanced managerial or technical qualifications.
There are no specific pre-requisites to study this course, however a basic knowledge of IT, an understanding of the general principles of information technology security, and awareness of the issues involved with security control activity would be advantageous.
We welcome all feedback and suggestions - please contact us at firstname.lastname@example.org if you are unsure about where to start or if would like help getting started.
Welcome to this video on business continuity management.
We’ll start by looking at the value of business continuity management to an organization before looking at the business continuity management process. Then we’ll investigate the impact of business disruption on an organization, including how long disruption should be tolerated before looking in detail at the implementation process and implementation planning.
We’ll conclude by reviewing the plan-do-check-act cycle to ensure plans are robust and continually reviewed.
Business Continuity Management includes:
· Business continuity planning which relates to people, premises, processes and procedures, and
· Creating a business continuity plan which focuses on the recovery of business and workgroup functions, like working groups and customer-facing areas
Business continuity management is valuable to an organization because:
· It reduces exposure to business risk. This is where risk management and business continuity management are part of the same business practice. However, risks are not all equal; some critically affect the business operation and others result in operational inconvenience.
· It increases the resilience of the organization, by identifying the areas where it can improve and be better placed to resist the effects of a major incident.
· It identifies and mitigates the risks related to failures in the supply chain. Many manufacturing organizations are reliant on suppliers and problems can soon filter through the chain. For example, when a memory chip plant in Japan was damaged by the Kobe earthquake, PC manufacturers were facing a shortage of components within weeks.
· It helps manage the risks of failures by key service providers. Having a robust business continuity management programme allows organizations to gain competitive advantage over less proactive rivals.
Although BCM is often used in planning and preparation for high-impact, low-probability events like earthquakes and pandemics, it can also be used for managing day-to-day incidents which, on their own aren’t significant, but can become serious if they’re allowed to accumulate.
There are four critical steps to business continuity management planning:
· The first step is to identify what needs to be protected. The simplest way to do this is to conduct a Business Impact Analysis for each functional area of the organization. This will collect detailed information about their business requirements, both during normal business operations and following a disaster. This is part of the risk assessment process.
· The second step is to determine how the elements should be protected. Developing recovery strategies helps to plan procedures by identifying a range of disasters and the corresponding responses. Internal and external communication strategies should then be created for each scenario.
· The third step is to validate and test. Alternate site and remote access locations should be tested to ensure business operations can resume quickly and efficiently.
· The final step is to educate employees. Employee information sessions and table-top exercises can help cascade key messages and ensure everybody understands the policies and procedures. Documentation should also be created so that BCM and disaster recovery information can be quickly distributed, including emergency contact information for internal staff, critical service suppliers and recovery locations.
We’re now going to look at some of the basic concepts used to describe the impact of a business disruption.
The diagram here shows the critical timeline for a disruptive incident and illustrates the relationship between three important concepts:
· Maximum Tolerable Period of Disruption
· Recovery Time Objective, and
· Recovery Point Objective
Let’s look at each of these in more detail. The Maximum Tolerable Period of Disruption, shortened to MTPD or MTPoD, is also known as the Maximum Tolerable Downtime, or MTD, and is a fundamental measurement. It’s defined by ISO 22301 as:
“the time it would take for adverse impacts, which might arise as a result of not providing a product/service or performing an activity, to become unacceptable.”
It could be very short for organizations that provide a real-time service, such as banks that provide 24-hour online services. The case of NatWest’s payment systems failure in June 2012 highlighted the dependency on ‘always on’ services. Other services can be unavailable for much longer periods of time without major impact, but there will come a point where the disruption can’t be tolerated.
Many organizations can survive some disruption to services providing they reach a certain level of performance or availability, and then gradually increase this. This point in time is referred to as the Recovery Time Objective or RTO and is defined by ISO 22301 as:
“the period of time following an incident within which a product or service or an activity must be resumed, or resources must be recovered.”
Actual recovery must of course commence much sooner than this, but organizations and their customers will often accept a lower level of performance if they can achieve a reasonable level of service. An example of this might be after a train incident where some rail traffic is recovered within a few hours, but it is accepted that it might take a few days to have the line back to normal levels of service.
The ISO standard goes on to state that for products, services and activities, the Recovery Time Objective must be less than the time it would take for the adverse impacts that would arise as a result of not providing a product/service or performing an activity to become unacceptable. In simple terms this means that whatever the Recovery Time Objective, it must always be less than the Maximum Tolerable Period of Disruption.
In planning terms, the target time for the worst-case data loss is known as its Recovery Point Objective, or RPO. This is defined by ISO 22301 as:
“the point to which information used by an activity must be restored to enable the activity to operate on resumption.”
It usually refers to the point at which the last backup activity was carried out. The BCI’s Good Practice Guide indicates that the Recovery Point Objective should be less than the Maximum Tolerable Data Loss.
As we saw earlier, business continuity management and risk management are closely interlinked. The risk management process begins with the identification of the business-critical assets, which might be people, premises, system, products, services, activities or information.
The next step is to conduct a business impact analysis to consider the loss or destruction of those assets. This will provide:
· The Recovery Time Objective, and
· The Maximum Tolerable Period of Disruption
The Recovery Time Objective must always be less than the Maximum Tolerable Period of Disruption.
After this, the Continuity Requirements Analysis is conducted for each asset, which yields:
· The Recovery Point Objective, and
· The Maximum Tolerable Data Loss
Remember, the Recovery Point Objective must always be less than the Maximum Tolerable Data Loss.
Once these steps are complete, it should be possible to conduct a standard risk assessment to plot the impact against the likelihood of an incident occurring. This will deliver a prioritised list of risks that can be examined for the most appropriate options and risk treatment recommendations can be made, i.e. to avoid, accept, tolerate, reduce or transfer the risk.
The most appropriate Standard for information security risk management is ISO/IEC 27005: Information Security Risk Management. The BCM Policy feeds into all areas of the risk management lifecycle, helping to determine which of the organization’s assets should be considered when conducting the business impact analysis, which then determines the optimum BCM strategy.
In turn, this feeds into the development and implementation of the BCM response and results in the final stage of exercising, maintaining and reviewing the business continuity structure.
Surrounding this is the need to embed the concept of BCM into the organization’s culture, which in many cases will challenge staff to think more deeply about their role in the organization and how they can help make it more resilient.
Some organizations treat BCM as a distinct project because many of the steps follow the same principles as standard project management. One of the most important elements of managing any project or programme is the assignment of responsibilities. This means the assignee must fully understand what they should achieve, the quality expectations, the method of delivery and the timeframe within which the project should be completed.
Within this, there should be a senior management representative, preferably at board level, who owns the overall BCM programme and answers to the rest of the board.
Assigning overall BCM ownership in this way ensures a top-down approach, so staff can see commitment from the highest level in the organization.
Once responsibilities have been agreed and assigned, the programme can be implemented. This includes data collection, planning, implementing mitigation measures, implementing the plans and raising awareness.
The overall implementation scope consists of:
· BCM programme management
· Understanding the organization
· Determining the BCM strategies
· Developing the BCM response, and
· Embedding the BCM culture throughout the organization
The management of the BCM programme is tackled in the same way as any project, beginning by defining what’s in scope, setting the objectives, agreeing the tasks and timescales, defining the actual deliverables, allocating responsibilities and defining the delivery milestones.
BCM tests and exercises need to be created and training needs to be provided to the staff involved in the BCM programme.
Business Continuity Management is an ongoing process. Once the cycle has been completed for the first time it should begin again. There are many reasons for this, including:
· The organization is unlikely to get it ‘right first time’ and a certain amount of practice will be required
· The organization itself won’t stand still and new products, locations and business strategies will affect the way BCM is approached, and
· Even though an organization might reach the point of BCM maturity, it may decide to take the next step of gaining certification against a standard such as ISO 22301
The BCM programme must deliver quality documentation, which will guide staff through the early stages of risk assessment and help them manage a serious disruptive event.
There are strong links between business continuity, disaster recovery and incident management; and both business continuity and disaster recovery incidents start with the incident management process.
Risk management protects the organization against failures and disruptive incidents. But other important elements are:
· Detection to observe that something disruptive has occurred and that remedial action needs to be taken
· The response by organization’s staff and possibly third parties to the incident, and
· The speed at which the organization recovers to a normal or near-normal state following the incident
To fully understand the organization, the ‘mission-critical’ areas of the business need to be defined. These could be any area of the business, for example operations, finance, sales, marketing and HR.
Senior management will direct the programme manager to the appropriate members of staff who can provide more detailed information. This will help in carrying out the business impact analysis which will examine the impact or consequence of various threats against the organization’s assets.
The consequences to the organization might be direct, in the form of financial loss, indirect, in the form of brand and reputation, or other types like legal and regulatory.
The next step will be to perform a continuity requirements analysis or CRA, identifying the resources required to achieve recovery. This will subsequently enable the organization to determine the most appropriate recovery strategies.
The final stage in understanding the organization is risk assessment. Having understood the possible consequences or impact of a threat materializing, the organization can take a view of the likelihood of it happening. In some cases, this will be based on historical data which is a quantitative risk assessment approach, or expert opinion which is a qualitative approach.
The actual risk is then calculated as the consequence multiplied by the likelihood, and the results are plotted on a risk matrix. This will allow the organization to decide which risks to address first.
Then, the options for dealing with the identified risks can be evaluated. None of these activities can be successful unless the organization produces appropriate plans, processes and procedures.
Some incident management plans are called ‘specific’ plans, because they will be targeted at one or more highly specific issues, such as the loss of a system or service.
Other types of plan are referred to as ‘generic’ because they deal with multiple issues, such as network outages and telecoms failures, which by their nature follow a similar path.
Disaster recovery plans tend to be specific, as they usually reflect the failure and recovery of particular systems.
Business continuity plans can also be specific or generic, depending on the nature of the incident. They come into effect once the initial response to the incident has identified the cause of the problem and the approach to the solution has been agreed.
Finally, business resumption plans are designed to take the organization beyond the continuity stage where it has recovered to a normal or near-normal state.
The verification and testing of the plans will be undertaken by the individuals who are directly involved in either the BCM or the disaster recovery processes. By the time plans are put into operation, these individuals should be fully familiar with them and be able to deal with the outcome of a disruptive incident.
For other staff though, the whole area of BCM and disaster recovery may be new and, even if they’re not involved in response or recovery, they should understand why the actions are required. However, they may become indirectly involved if an incident occurs and they need to evacuate a building or move to an alternative location.
Suppliers, customers and other stakeholders will also have an interest and may need to be briefed before or shortly after an incident.
The Plan-Do-Check-Act model can be applied to almost any management system and is used in a variety of national and international standards, including ISO 22301: Business Continuity Management Systems Requirements.
· The ‘Plan’ phase is designed to establish business continuity policy, objectives, targets, controls, processes and procedures relevant to the business continuity programme, and deliver results aligned with the organization’s overall policies and objectives.
· The ‘Do’ phase involves the implementation of the business continuity policy, objectives, targets, controls, processes and procedures.
· The ‘Check’ phase includes monitoring and reviewing performance against business continuity policies and objectives, reporting the results to management and authorising actions for remediation and improvement.
· The ‘Act’ phase covers maintenance and improvement of the business continuity management system by taking corrective action based on the results of the management review, and re-appraising the scope of the system, business continuity policies and objectives.
That’s the end of this video on business continuity management.
Fred is a trainer and consultant specializing in cyber security. His educational background is in physics, having a BSc and a couple of master’s degrees, one in astrophysics and the other in nuclear and particle physics. However, most of his professional life has been spent in IT, covering a broad range of activities including system management, programming (originally in C but more recently Python, Ruby et al), database design and management as well as networking. From networking it was a natural progression to IT security and cyber security more generally. As well as having many professional credentials reflecting the breadth of his experience (including CASP, CISM and CCISO), he is a Certified Ethical Hacker and a GCHQ Certified Trainer for a number of cybersecurity courses, including CISMP, CISSP and GDPR Practitioner.