The course is part of this learning path
CISSP: Domain 1, Module 2
This course is the 2nd of four modules of Domain 1 of the CISSP, covering security and risk management.
The objectives of this course are to provide you with and understanding of:
- Professional Ethics
- How to develop and implement documented security policies, standards, procedures, and guidelines and the differences between them
- The fundamentals of business continuity requirements
- How to contribute to personnel security policies
This course is designed for those looking to take the most in-demand information security professional certification currently available, the CISSP.
Any experience relating to information security would be advantageous, but not essential. All topics discussed are thoroughly explained and presented in a way allowing the information to be absorbed by everyone, regardless of experience within the security field.
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
So we're going to move into another area remembering that this domain is about security and risk management. And clearly business continuity or disaster recovery is going to be one of those areas that is a direct derivation of our response to risk, to make sure that our organization deals with it effectively so that it can survive the long term. So in this particular module we're going to discuss how to develop and document the project plan to produce the business continuity plan. We're also going to conduct a business impact analysis, which is a complement to a risk analysis. So we're going to look at the development and the documentation of project scope and plan. And we're going to conduct a business impact analysis.
Now in this particular section, just to be clear, this is going to address the idea of putting together a project that will ultimately produce the business continuity or disaster recovery plan. So the first step in producing a workable business continuity plan will be gaining senior management's support to move forward with the project. Now this may seem like an unnecessary step the idea of selling management on putting together a business continuity plan. To the minds of many that would seem like a natural thing that they would want to do. But the simple fact is these plans require a good deal of investment and time and resources to get them done. And they're frequently looked at as being something like an insurance policy. And there's an old saying. Insurance policies are never bought, they're only sold. Management is of course very diligent in doing what they do. And they don't need to be sold on the idea of putting together a business continuity plan, which as everyone knows could make the difference between a business failing and a business recovering after a severe outage. So it's not a question of that. It's a matter of selling it like you would sell any project.
As a project it needs to be organized properly. There needs to be proper resourcing, proper timeline, proper deliverables. And that's more what needs to be sold than the very idea of having a business continuity plan. Management certainly grasps the importance. But they want to be sure that the project moves forward with the same rigor and discipline than any other project they would invest considerable effort in should have. To that end we must define a project scope, the objectives to be achieved, the planning assumptions, and we have to define what our deliverable ultimately is going to be. Simply calling it a disaster recovery plan or a business continuity plan is not going to be sufficient. Because that entails a great deal of stuff. And so management needs to know precisely what they're going to get for their investment. To bring more control, more rigor to this we have to estimate project resources needed to be successful, as we all know under-resourcing a project is pretty much a guarantee of its failure. And we need to be very careful about how we estimate what that's going to be so that we don't shortchange the effort and resources necessary to properly complete this. And that means human, financial and equipment. We're going to have to define the timeline and the major project deliverables that management will receive when this is over.
Now as I mentioned we're going to discuss the idea of doing a business impact analysis. Now, starting out it's important to realize that this is a complement to a risk analysis. The focus of this is somewhat different than a risk analysis. A risk analysis looks at adverse events that may occur with some regularity of occurrence or infrequency of occurrence. And what damage they could produce when they do. A business impact analysis instead focuses on the business and what the impact of such events will be on the business. So you see, this takes information from the risk analysis and places it directly over the operation that will be affected by the event to estimate what the loss is, whatever definition of loss we're going to apply, will be to the business that we will have to plan a recovery from. So the primary goals of the business impact analysis will be to look at the criticality of the given item or entity under consideration. That could be an asset, a system, a data repository or an entire segment of the overall operation. We need to estimate several things, primary of which is the maximum tolerable downtime, or MTD. And in that we need to include as much detail as we can surrounding the data turnover rate. This will play a very significant role in the various estimations of time and data currency we're going to have to make during the course of this project. And ultimately we have to come up with some estimate of what the internal and external resource requirements are going to be. So the essential process that you see there tracking that arrow will be to gather the information, performing a vulnerability assessment. Now, bear in mind again that this is looking at the business. So this is not a network or system vulnerability assessment, at least not exclusively.
We have to look at the things that our business operation may be vulnerable to, and how we're going to effectively address those. We have to analyze the information, pull out the salient parts, so that we know what we're addressing and what sort of vulnerabilities or other attributes we have to address and provide compensation for. Ultimately we document this in our planning documents and present this to management, along with our recommendations for how we're going to go about dealing with them effectively through the course of plan development. And then ultimately when we have to activate the plan. So in gathering the information we're going to need to look at departments, processes, and we're going to have to make various assessments and assignments of criticality to each one of those elements as regards the overall recovery process to recover the organization or the element that has been affected by the event. We're going to have to establish processing priorities between departments and alternate operating procedures that can be used during the recovery process. If we've suffered an outage of sufficient magnitude clearly we're not going to be able to run as we normally would. This might involve different ways, different locations, different personnel even.
So, putting these things together is the way that we're going to plan for our company's or our department's survival. Moving on to the analysis of the information, we need to look at the information we've collected and we need to interpret to determine the overall impact of various threats to the organization. So I mentioned maximum tolerable downtime. Within that will be two other objectives we have to establish, the recovery time objective. That point in time by which we must have the minimum most critical operations back online in order to ensure the best chance of the organization surviving. As the pilots would say, you can't recover if you don't survive the crash. So this is our way of defining how we can best survive the crash and how much time in which we'll have to do that. Included in this will be the recovery point objective. Thinking back to what I said about data turnover rate, we have to determine what our data turnover rate is so that we know what data currency must be established by the time we achieve the recovery time objective. Now we'll talk about these more in future slides.
We will do a quantitative analysis, focusing on the business we need to look at, what the financial loss outright will be. There will of course be extra expenses. We need to do our best to estimate what those might be, what kind and what order of magnitude they will be, and whether or not there's any regulatory issues. That might even include what fines could we expect if we get fined following the issue that occurs to create the outage. So that we know what our total financial impact is going to be. Along with this there are going to be the more intangible aspects. Qualitative analysis to damage to our reputation. Stakeholder confidence and various other intangibles that are going to contribute in a less measurable way, but things that we know are going to be negative impacts overall that will contribute to our difficulties during this time. So these are very very important to determine the maximum tolerable downtime and other criteria.
Now within MTD, we're going to have to look at all the functions and applications, and how sensitive they are in terms of what impact their loss will have on our organization and the order in which we're going to have to recover them for maximum assurance that the business itself will survive. So we set the MTD. Now the MTD defines the maximum amount of time that our organization can be down. And if you envision a pyramid with MTD at the very top, if we surpass MTD, the probabilities of our being able to recover at all rapidly diminish. So that if we surpass this particular benchmark our opportunities and our chances to recover dim very rapidly. So we have recovery time objective that is set up, and as a time parameter the recovery time objective is necessarily less than MTD. The idea between these two is, if we achieve RTO, chances are very good that we will not achieve MTD. And by achieving RTO we improve the probabilities of our full recovery by a great deal. Whereas if we exceed that and proceed on to MTD, quite the reverse will be the case. Now one of the things that we want to calculate is the effect of availability versus up-time. Systems are managed with an expectation of up-time or availability. The first thing that's important is make sure that you understand very clearly the difference between up-time as a matter of the machine is up and running, or availability, meaning that the system or application is available to be used and in a usable state by the affected user community. And our contract language needs to be certain to reflect the differences between these two things. Up-time versus availability. And here you have a chart that shows you availability percentage, everything from 90% through seven nines availability, and the respective increments that we can experience during the year, during the month, during the week. Surprisingly 90% up-time, which on its face sounds like a lot, gives us almost a month and a half of downtime possible per year. And it still means we're running nine days out of 10. We improve that by 10%, we drop that number from 36 1/2 days to 3.65 days, quite a dramatic drop. As we extend each additional nine the time drops very significantly from 99 to 99.9, we drop from 3.6 days to 8.7 hours. Four nines, less than an hour per year.
Now as you go through this you should be thinking, this is about how much downtime is available for planned and unplanned outages. Whether this is your shop or a cloud provider. And again, knowing the difference between up-time versus availability is critically important in terms of contract language as to what that means to our working population. So the recovery time objective is the time necessary to get the critical processing online following some form of outage event. I want to emphasize that this is the most critical thing, needs to be back online by the time the RTO has been achieved. Not everything, which is the common misperception. The most critical thing, the thing that will make sure that the business survives if it's back online in an operable state. Within this the recovery point objective describing the state of data currency must be a part of achieving the RTO in order to make certain that you have done everything possible to survive the crash so that you can recover.
About the Author
Mr. Leo has been in Information System for 38 years, and an Information Security professional for over 36 years. He has worked internationally as a Systems Analyst/Engineer, and as a Security and Privacy Consultant. His past employers include IBM, St. Luke’s Episcopal Hospital, Computer Sciences Corporation, and Rockwell International. A NASA contractor for 22 years, from 1998 to 2002 he was Director of Security Engineering and Chief Security Architect for Mission Control at the Johnson Space Center. From 2002 to 2006 Mr. Leo was the Director of Information Systems, and Chief Information Security Officer for the Managed Care Division of the University of Texas Medical Branch in Galveston, Texas.
Upon attaining his CISSP license in 1997, Mr. Leo joined ISC2 (a professional role) as Chairman of the Curriculum Development Committee, and served in this role until 2004. During this time, he formulated and directed the effort that produced what became and remains the standard curriculum used to train CISSP candidates worldwide. He has maintained his professional standards as a professional educator and has since trained and certified nearly 8500 CISSP candidates since 1998, and nearly 2500 in HIPAA compliance certification since 2004. Mr. leo is an ISC2 Certified Instructor.