Implementing anything new is extremely difficult, especially when everyone is too busy to focus on the bigger picture.
At this stage of the business continuity management lifecycle, plans, processes, and procedures are developed to meet organisational requirements.
Some incident management plans are called ‘specific’ plans because they will be targeted at one or more highly specific issues, such as the loss of a system (availability) or service.
Other types of plans are referred to as ‘generic’ because they deal with multiple issues, such as network outages and telecoms failures.
- Disaster recovery plans tend to be specific, as they usually reflect the failure and recovery of particular systems.
- Business continuity plans can also be specific or generic, depending on the nature of the incident – they come into effect once the initial response to the incident has identified the cause of the problem and the approach to the solution has been agreed.
- Finally, business resumption plans are designed to take the organisation beyond the continuity stage where it has recovered to a normal or near-normal state.
In this step, you’ll start by learning some key considerations for disaster recovery plans, including associated site and standby systems. Afterwards, you’ll consider what documentation and testing you could do to support these and other plan types, before finally seeing how best to communicate your plans.
Disaster recovery plans
Following a disaster:
- 40% of businesses do not reopen.
- 25% fail within one year.
- 90% fail within two years.
Therefore, your organisation needs a disaster recovery plan. The disaster recovery plan outlines the procedures that need to be followed in the event of a disaster to protect your organisation. Let’s explore some key considerations for your disaster recovery plans.
Your plan should help your organisation recover from major failures of IT and communications infrastructure within the Recovery Time Objective.
If a data centre becomes unavailable, for example, because of a fire or flood, your business may need to move operations to an alternate site. This could be another facility belonging to your organisation or a disaster recovery site provided by a third party.
Once operations have been moved to the alternate site, the order in which the various systems are brought back to operation needs to be carefully planned. You’ll learn more about sites and their associated systems later in this step.
Following a major loss of IT infrastructure, the normal means of communicating inside and outside the organisation, for example email or telephone systems, may be unavailable. Alternative means of communication, especially in the early stages of the incident, need to be established with a secure connection.
There are a few other areas to consider when creating a disaster recovery strategy.
- Managed services are an important part of most organisations and can’t be overlooked. You’ll want to consider cloud data storage, backup tape management, remote monitoring of servers and services and updating of systems.
- Offsite storage of vital information that allows your business continuity team to recover business activities. Staff must know where to find the information in the offsite facility. There must be appropriate security access to reach it and it must be kept up to date with the equivalent operational information.
- Third parties, like suppliers, who may have an involvement in the response, recovery, and resumption processes. For example, if hardware needs to be replaced, ensure your organisation or its suppliers have enough stock to meet the demand in a suitable timeframe.
Sites and standby systems
When lighting struck the two-storey office building of Cantey Technology, causing a fire to break out, the worst was feared for the IT company hosting servers for more than 200 clients at the time. The fire incinerated Cantey’s entire network infrastructure, melting cables and computer hardware and equipment beyond repair. However, as part of the firm’s business continuity plan, it had moved its client servers to a remote data centre, which also had regular back-ups scheduled. Cantey’s 200 clients experienced little to no service interruption because of their plans.
When a disaster occurs, businesses sometimes need to move operations to an alternate site. On a disaster recovery site, standby systems will be used to improve system reliability and availability. Typical standby techniques involve cold standby, hot standby, and warm standby. The sites themselves are similarly categorised as cold, warm, and hot.
Figure 1: The key considerations of disaster recovery
Cold sites may simply be an empty building shell, with little or no facilities, desks, structured cabling, or power distribution. This makes them very adaptable, but slow to bring into service.
A cold standby environment is activated only if the primary system fails. If it does, a backup replica of the primary environment is deployed at the disaster recovery site. This backup process might take a long time, because of the need for initialisation.
Warm sites may have power distribution, structured cabling, and many basic facilities, but might not be fully furnished or equipped with complete IT systems and services. Warm standby systems and sites will always be more costly than cold systems.
Warm standby systems tend to be powered on and usually have an operating system running at a disaster recovery site. They may have appropriate applications loaded and might even be maintained with up to date patches and software releases. However, it’s unlikely that there will be any recent data, which will need to be restored from backup media.
Hot sites are fully equipped with all necessary facilities and are ready for people to move into and to use with immediate effect. They are the most expensive sites to maintain.
Hot standby systems are always fully powered, with operating systems and applications loaded and running, containing almost or all current data. Users can normally be switched from one hot system to another very quickly and may not even notice the transition.
At the very top end of the scale are mirrored sites. This is the ideal disaster recovery site, where there can be no loss of data and no downtime. Failover (switching) between two sites should be almost instantaneous and unnoticeable to the user.
As with any alternative sites, distance is a key consideration. If sites are too close, they might both be affected by the same disaster, whereas if they’re too far apart it might take too long to restore operations at a remote facility. Distance can also affect the time it takes for data to cross the link between the sites, and impact data synchronisation.
Documentation and testing
Whether it’s a general or specific plan, it should always be written by individuals who have expertise in specific areas, and then peer reviewed.
In more general areas, it may be beneficial to have the plans produced by a small team, all of which have some experience in the subject, but can look at things from a slightly different perspective.
Setting up standard templates will ensure your plans are consistent. It’s also important for you to consider how and where they are stored, accessed, and updated. Ongoing updates are essential, and any that contain sensitive or confidential information must be kept secure. Once produced, your plans should be reviewed by technical experts and sense-checked by individuals who have little or no knowledge of the subject.
The next stage is a read-through, where interested parties, usually those who might have to implement your plans, review them as if they were following a real incident. This exercise is the cheapest to run, easiest to prepare, useful for training purposes, and provides an important tool for embedding BCM in your organisation’s culture.
A table-top simulation exercise is scenario-based and is likely to offer the most efficient method of validating plans and rehearsing key staff. It brings staff together to take decisions as a scenario unfolds in very much the same way they would in the event of a real incident.
Finally, a live exercise can be used to simulate some aspects of the disaster. However, this rarely involves a full interruption test, where critical systems are switched off, because this is generally considered dangerous and disruptive. It’s one to avoid if your recovery time objective isn’t almost instantaneous.
Whatever type of exercise you opt for, it’s worth considering inviting other stakeholders or anyone you rely on to deliver your key services and/or products. Immediately after any exercise, hold a debrief where you can write up lessons learned and record any actions to take forward.
Communicating the plans
BCM or your disaster recover processes need to become part of the culture of your organisation. This will ensure that your plans resonate when communicating them to suppliers, customers, and other stakeholders.
- Know your organisation
To write an effective and actionable office emergency plan, you really need to fully understand the roles and teams within your organisation.
- Know your keys to business success
Now that you know who is on which team and have an idea of what each team does, it’s time to schedule some meetings with the team leads and secondary team leads. In these meetings you want to drill down and determine the key aspects of each teams’ functions - you need to know clearly what each team does and how they are responsible for achieving the end goals of your business.
Knowing this information is vital in determining how to build your business continuity plan to ensure that you can still meet client demands. For example, if the IT department only has one person who knows how the servers work, what will you do if this person is not able to come into the office during a massive snowstorm that coincides with a server outage? (Single point of failure)
The outcomes from these meeting should be things such as: scheduling training sessions for the secondary team leads, establishing priority contact lists in the event of a disaster or threat, understanding who should be contacted first and what information should be communicated, and the best ways to communicate with team members.
- Know the best way to communicate
How are you going to contact your team leaders in the event of a fire, building collapse, inclement weather, etc.? What are you going to tell these team leaders? Remember that head of technical writing does not need to know the same information as your chief accountant and this person does not need to know the same details as the person in charge of your warehouse.
Your emergency communication plan must include who you’re going to contact, how you’re going to contact them and what you’re going to tell them. You must be prepared to communicate short and simple messages. Remember in a state of crisis or panic, people often have trouble focusing - so keep your messaging concise and to the point.
For most staff, the whole area of BCM and disaster recovery may be new and, even if they’re not involved in response or recovery, they should understand why the actions are required. However, they may become indirectly involved if an incident occurs and they need to evacuate a building or move to an alternative location. It’s therefore important to arrange easy access to your plans with accessible resources to guide staff who may find themselves at the centre of disaster.
Suppliers, customers, and other stakeholders will also have an interest and may need to be briefed before or shortly after an incident.
‘Implementation’ as it has been described to you, includes a wide scope of BCM activities. While there has been good coverage of these activities, you may still be wondering where to get started and what should happen next. In the final step, you’ll be introduced to a model that will provide you with a logical structure to manage your business continuity programme.
In this Course, you will learn about what it takes to implement your business continuity plan, which covers a wide range of activities for BCM owners to follow. You’ll later turn your attention to disaster recovery (as part of BCM), and how to document, test and communicate your plans. You’ll end this Course by looking at a common approach to understanding how business activities will be affected during and following a disruption, called Plan-Do-Check-Act.
A world-leading tech and digital skills organization, we help many of the world’s leading companies to build their tech and digital capabilities via our range of world-class training courses, reskilling bootcamps, work-based learning programs, and apprenticeships. We also create bespoke solutions, blending elements to meet specific client needs.