CISM Foundations — Module 4
The course is part of this learning path
In this course, we will discuss various vitally important metrics used to determine how well we have mitigated risk and how closely we have matched the requirements of our enterprise. These metrics include Annualized Loss Expectancy (ALE), Recovery Time Objective (RTO), Recovery Point Objective (RPO), Service Delivery Objectives (SDO), Maximum Tolerable Outage/Downtime (MTO/MTD), and Allowable Interruption Window (AIW).
We then move on to look at how these metrics can be applied to business continuity (BC) and disaster recovery (DR) planning and we'll also have a look at BC and DR in general, how it works, and the associated processes and techniques. Finally, we move on to testing BC/DR planning and the types of tests we can use.
If you have any feedback relating to this course, please reach out to us at email@example.com.
- Learn about the metrics for measuring performance in managing risk
- Get a solid understanding of business continuity and disaster recovery
- Understand how to test business continuity and disaster recovery practices
This course is intended for those looking to take the CISM (Certified Information Security Manager) exam or anyone who wants to improve their understanding of information security.
Any experience relating to information security would be advantageous, but not essential. All topics discussed are thoroughly explained and presented in a way allowing the information to be absorbed by everyone, regardless of experience within the security field.
We will now move into section eight of our CISM series in which we will discuss various vitally important metrics that we need to determine how good of a job we are doing and how closely we have matched the requirements of the enterprise.
Here we have a well-known ALE or annualized loss expectancy calculation. The annualized loss expectancy is a calculation to estimate what the impact will be of an event based on what happened from a single occurrence and the annual rate that this event does in fact occur.
From this calculation, we derive a rough order of magnitude to describe the decrease in value or capability of an asset after this adverse event causes its impact. This calculation should be done for each type of outage scenario or for each category of outage scenarios.
It must be recognized though that the final result finally achieved depicting the annualized loss expectancy is only to be taken as an indicator of the potential loss that could occur and not as a precise monetary measurement. Its other use is as a budget figure for a particular control or countermeasure to mitigate this particular type of loss.
So to walk through the calculation, we take the asset value and multiply that by an exposure factor. This tells us the total cost of ownership of the given asset and how much of its capability has been lost through the adverse event. It is with this that we derive our single loss expectancy or SLE.
We next take the annual rate of occurrence or ARO of this event, multiply the two together, and that gives us the ALE. By doing this calculation for each in the scope of outage scenarios, we are able to develop a budgetary forecast of costs for controls or countermeasures to offset the in-scope losses projected.
So here is an example. We have the first building's asset value determined to be $400,000. An adverse event has caused a loss of 33% of that building's value. By multiplying the two together, we derive an SLE of $132,000. In this example, we are saying that the ARO is .1, which means this event, whatever it may portray, happens once in 10 years.
Our next multiplication is to take the SLE multiplied by the ARO to derive our expected annualized loss expectancy, which is $113,200. It is also important to bear in mind that the effects covered by this calculation are first-order effects only. Secondary and tertiary-order effects can be added in but should be only so far after the event has actually occurred and their consequences have been observed and calculated.
One very important aspect of this calculation to bear in mind is that this as an indicator that the ALE is not intended to represent a finite amount or the very last possible dollar in losses. Thus, the ALE as we described before is a rough order of magnitude, and it should be taken as such rather than thinking that it must be the finite amount that we will spend on a given loss.
History shows that these losses can vary very widely and such variation should be anticipated. History also shows that there will very likely be a great deal of difference between what we can imagine the event will produce and what the event will in fact produce by way of consequences.
The recovery time objective is defined as the amount of time required to get compromised systems or operations back to an acceptable level of performance. Put another way, the recovery time objective defines a period of time by which we are able to get our most critical systems back to an acceptable level of operation.
The RTO also proceeds the maximum tolerable downtime definition. By attaining the RTO, it can be presumed, though not guaranteed, that we will not achieve the MTD. In the example given stating that the business can survive for only three days after ordering systems go down, from this we interpret that the RTO cannot be greater than three.
Along with the RTO is the recovery point objective or RPO. The RPO defines an amount of data loss that can occur and not cause catastrophic interruptions in business. Thus, the calculation of data loss may produce a number related to data turnover rates or quantities or percentages of data loss due to adverse events. It is therefore proper to assume that RPO and RTO are intimately related in terms of how quickly we must get a business back online.
In our example, taking eight hours to restore six hours' worth of data, the RPO is exceeded by two hours. Should it turn out that the two hours is unavoidable, a risk acceptance decision may have to be considered or some form of compensation type control may have to be put in place.
In this graphic, we see various comparative recovery times involving RPO and RTO. In the center of our graphic, we have a lightning bolt which indicates an adverse event occurring. To the left, we have the recovery point objective measured in varying divisions of time from weeks to seconds. One of the things reflected in this is the resiliency of a business and the constancy of the data that it uses.
Shown below the red arrow are various mechanisms for enabling data backup and restore in varying levels of length of time to do so. It is easy to see that the further to the left we go, the longer it takes with tape backup taking the longest and database shadowing taking the shortest.
On the right-hand side of the lightning bolt, we show the recovery time objective also measured in time divisions from seconds to weeks. These increments depict the minimum time required to activate the most critical systems to assure event survival and enable further recovery. As the arrow points to the right, we see that again tape stores require the longest time, and clustering takes the shortest time.
Another point that should be obvious of this is that the further out the points of the arrow going in either direction, the longer recovery effort will take. Conversely, the closer the lightning bolt we get, the shorter the period. What should also be obvious from this is that the further outward we go, the less costly the effort will be. And the further towards the center we go, the more expensive the effort will be.
What must also be realized is that this kind of depiction focuses on the most critical systems and the most critical data and not the entire enterprise unless the entire enterprise is small and can be encompassed within a single effort of this type. As a final point, even though the cost goes up the closer into the center we get from either direction, the value of the loss and the value of the recovery effort and the technology used to support it should justify whatever is spent on the recovery effort and the technology to do it.
Next we have the service delivery objectives. Now, the service delivery objectives or SDO define the minimum level of service that must be restored after an event until normal operations can resume. Once again, this means that we are looking at the most critical systems and data. The SDO therefore is affected very directly by the combination of RTO and RPO. The various metrics that will define the SDO should be transactions per second, the number of concurrent supported users, and similar performance metrics. We've spoken about RTO and RPO.
Now let's examine their relationship to the maximum tolerable outage or downtime abbreviated as either MTO or MTD. Now, the MTD defines the period of time by which we must have our operation back on its feet and working. The difference between the RTO and the MTD is that by achieving the RTO, we should not run the risk of achieving MTD.
The MTD has historically been defined as that length of time that should a business not get back into at least its minimal level of operation as defined by the RTO, the greater the chances the business will face going out of business altogether. Thus it is that we should aim to achieve RTO to prevent our achieving MTD.
In performing the risk assessment and the business impact analyses, that will be the foundation for this kind of effort. Consideration of these timeframes is crucial. In thinking about RTO and MTD, the primary consideration should be driven by the business impact analysis to define the minimum level of operational support required to ensure that the business will be able to recover and not be distracted by trying to bring back everything online within the RTO. This means, of course, that we will only be able to bring the most critical systems and data into these calculations. And in the efforts, we will then plan to bring them back online under these adverse conditions.
The allowable interruption window is the amount of time normal operations are down before major financial problems arise. Now, this allowable interruption window or AIW represents a maximum duration in which the business must recover or face possible extinction. Thus, the MTD cannot be greater than the AIW.
If by the employment of certain types of controls, procedures or other mitigating methods the time between MTD and AIW can be lengthened, the potential for the impact reduction becomes possible. So let's explore a calculation using these different figures that we've spoken of. We begin with the asset value of a primary facility in developing rocket engines for spaceships.
The value of this asset is calculated at $100 million. In the event scenario we're discussing, the exposure factor is how much of that facility is rendered unusable in some form of an attack. In our example recalculating, we calculate a 40% loss. Thus our calculation is the loss from this single attack is $100 million dollar asset value times 40% impact, which gives us a result of $40 million.
History shows that we should expect this type of an attack once every four years which provides us an ARO of .25. By taking these numbers in our calculation, 40 million times .25, we had the result of 10 million as our annualized loss expectancy. This number will then be used as a budget figure that will be used to select various kinds of controls and counter measures to offset or even prevent the $10 million loss per year over that four-year period.
Mr. Leo has been in Information System for 38 years, and an Information Security professional for over 36 years. He has worked internationally as a Systems Analyst/Engineer, and as a Security and Privacy Consultant. His past employers include IBM, St. Luke’s Episcopal Hospital, Computer Sciences Corporation, and Rockwell International. A NASA contractor for 22 years, from 1998 to 2002 he was Director of Security Engineering and Chief Security Architect for Mission Control at the Johnson Space Center. From 2002 to 2006 Mr. Leo was the Director of Information Systems, and Chief Information Security Officer for the Managed Care Division of the University of Texas Medical Branch in Galveston, Texas.
Upon attaining his CISSP license in 1997, Mr. Leo joined ISC2 (a professional role) as Chairman of the Curriculum Development Committee, and served in this role until 2004. During this time, he formulated and directed the effort that produced what became and remains the standard curriculum used to train CISSP candidates worldwide. He has maintained his professional standards as a professional educator and has since trained and certified nearly 8500 CISSP candidates since 1998, and nearly 2500 in HIPAA compliance certification since 2004. Mr. leo is an ISC2 Certified Instructor.