Module 8 - Business Continuity and Disaster Recovery
The course is part of this learning path
Business continuity management and disaster recovery are about an organization being prepared for business disruption and taking the necessary actions to get the business operational as soon as possible after an incident occurs. This course provides a strong foundation in each area by looking at what business continuity management is, why it’s important and how it can be implemented within the overall risk management process, before reviewing the disaster recovery process.
The objectives of this course are to provide you with and understanding of:
- The value of business continuity management to an organization
- The business continuity management process
- The impact of business disruption on an organization and how long disruption should be tolerated
- The business continuity implementation process and implementation planning
- Disaster recovery strategy and the importance of disaster recovery planning
- Different standby systems and how these relate to recovery time
- The importance of robust documentation and testing of the plan
This course is ideal for members of information security management teams, IT managers, security and systems managers, information asset owners and employees with legal compliance responsibilities. It acts as a foundation for more advanced managerial or technical qualifications.
There are no specific pre-requisites to study this course, however a basic knowledge of IT, an understanding of the general principles of information technology security, and awareness of the issues involved with security control activity would be advantageous.
We welcome all feedback and suggestions - please contact us at firstname.lastname@example.org if you are unsure about where to start or if would like help getting started.
Welcome to this video on disaster recovery during which, we’ll look at the disaster recovery strategy and the importance of disaster recovery planning.
We’ll then look at the different standby systems that can get an organization operational after a business disruption and identify how these relate to recovery time. We’ll conclude by reviewing the importance of robust documentation and testing of the plan.
Disaster recovery plans help an organization recover from major failures of IT and communications infrastructure within the Recovery Time Objective. If a data centre becomes unavailable, for example, because of a fire or flood, the business may need to move operations to an alternate site. This could be another facility belonging to the organization itself or a disaster recovery site provided by a third party.
Once operations have been moved to the alternate site, the order in which the various systems are brought back to operation needs to be carefully planned.
Following a major loss of IT infrastructure, the normal means of communicating inside and outside the organization, for example email, and telephone systems, may be unavailable so alternative means of communication, especially in the early stages of the incident, need to be established. It’s also important that these alternative means of communication are secure.
There are a few other areas to consider when creating a disaster recovery strategy.
· The provision of managed services, like cloud data storage, backup tape management, remote monitoring of servers and services and updating of systems. Managed services are an important part of the organization and can’t be overlooked.
· Offsite storage of vital information which includes any information that allows the business continuity team to recover business activities. Staff must know where to find the information in the offsite facility, there must be appropriate security access to reach it and it must be kept up to date with the equivalent operational information.
· Third parties, like suppliers, who may have an involvement in the response, recovery and resumption processes. For example, if hardware needs to be replaced, ensuring the organization or its suppliers have enough stock to meet the demand in a suitable timeframe.
Sites and systems can be categorised by the speed at which they can be brought into full service in the event of a disaster. This is referred to as cold, warm or hot.
· Cold standby systems may not even be powered on or have an operating system or any applications loaded. This has the benefit of reducing initial costs, but it will take time and effort to bring them into service. Cold sites may simply be an empty building shell, with little or no facilities, desks, structured cabling or power distribution; this makes them very adaptable, but slow to bring into service.
· Warm standby systems tend to be powered on and usually have an operating system running. They may have appropriate applications loaded and might even be maintained with up-to-date patches and software releases. However, it’s unlikely that there will be any recent data, which will need to be restored from backup media. Warm sites may have power distribution, structured cabling and many basic facilities, but might not be fully furnished or equipped with complete IT systems and services. Warm standby systems and sites will always be more costly than cold systems.
· Hot standby systems are invariably fully powered, with operating systems and applications loaded and running, containing almost or completely current data. Users can normally be switched from one hot system to another very quickly and may not even notice the transition. Hot sites are fully equipped with all necessary facilities and are ready for people to move into and to use with immediate effect. They are the most expensive sites to maintain.
At the very top end of the scale are mirrored sites. These are high availability solutions, required where there can be no loss of data and no downtime; failover between the two sites should be almost instantaneous and unnoticeable to the user.
As with any alternative sites, distance is a key consideration. If sites are too close, they might both be affected by the same disaster, whereas if they’re too far apart it might take too long to restore operations at a remote facility. Distance can also affect the time it takes for data to cross the link between the sites and impact in data synchronization.
Mirrored sites are often under the control of the business. Hot, warm and cold sites are often provided as a service by third parties. The provider’s ability to meet the service, especially in a situation where multiple organizations are affected, is a critical consideration.
As the volatility of data increases, the need to recover quickly increases. That usually means more expensive solutions.
Many organizations still require point-to-point data circuits to connect their systems. When considering options for business continuity, lead times for installing data circuits are important. Both voice and data connections may be required between the organization and key third-party providers, for example payment services, and these must be considered when planning disaster recovery solutions.
There may also be legacy communication systems like fax which may need to be restored.
Plans should be written by individuals who have expertise in specific areas, then they should be peer reviewed.
In more general areas, it may be beneficial to have the plans produced by a small team, all of whom have some experience in the subject, but can look at things from a slightly different perspective. Setting up standard templates can ensure plans are consistent.
It’s also important to consider whether the plans are produced in soft and hard copy, and how and where they’re stored, accessed and updated. Ongoing updates are essential, and they must be kept secure; especially if they contain sensitive or confidential information.
Once produced, all plans should be reviewed by technical experts and sense-checked by individuals who have little or no knowledge of the subject. They should also be proof-read.
The next stage is a read-through, where interested parties, usually those who might have to implement the plans, review them as if they were following a real incident.
A table-top simulation exercise can be run to follow the plan by implementing a realistic simulated event.
Finally, a live exercise can be used to simulate some aspects of the disaster. This rarely involves a full interruption test, where critical systems are switched off, because this is generally considered dangerous and disruptive, particularly if the recovery time objective isn’t almost instantaneous.
That’s the end of this video on disaster recovery.
Fred is a trainer and consultant specializing in cyber security. His educational background is in physics, having a BSc and a couple of master’s degrees, one in astrophysics and the other in nuclear and particle physics. However, most of his professional life has been spent in IT, covering a broad range of activities including system management, programming (originally in C but more recently Python, Ruby et al), database design and management as well as networking. From networking it was a natural progression to IT security and cyber security more generally. As well as having many professional credentials reflecting the breadth of his experience (including CASP, CISM and CCISO), he is a Certified Ethical Hacker and a GCHQ Certified Trainer for a number of cybersecurity courses, including CISMP, CISSP and GDPR Practitioner.