The course is part of this learning path
This course covers the security aspects of incident management and the process of identifying, responding to, and recovering from security breaches or attacks on an organization's systems and networks. Security incident management is a critical process for organizations to protect themselves against security threats and minimize the impact of any incidents that do occur.
- Learn about security incident management
- Learn about the various steps involved in security incident management, detection, analysis, containment, response, and recovery
Security Engineers interested in learning about incident management
Any experience relating to information security would be advantageous, but not essential. All topics discussed are thoroughly explained and presented in a way allowing the information to be absorbed by everyone, regardless of experience within the security field.
So what are we talking about? Well, in incident management we have three things that we worry about. We have events. Now an event itself is a generic occurrence or to put a tactically, a change of state within the system. Now events are not necessarily bad. Events are just simply things that happen within system. Many times it's just a normal transaction, a login process, a copy file, or some other normal operation. If however, it has a negative impact, this becomes what we call an incident. And this too is a change of state but it's regarded as adverse. And it typically is going to impact one of our three elements in our CIA triad.
Normally it is most likely to impact the operations of a program or the availability of data or a program or other vital factors. If it goes to another level that involves the exposure in a human readable form of covered data such as protected health information, or personally identifiable of some type, then we call it a breach. And this requires a specific response from specific parties within our organization.
In our triage process, we need to identify which of the event types it is as early as we can in the cycle. Because the earlier we can identify what it is and whether or not it contains any of these elements, that's going to define a lot about the process that we're going to follow from that point forward. We have a team to do incident management. These are things that can cause incidents. They typically impact functionality in some way. This would be user errors. Of course, our outsider attacks and hacking. It could be a component failure, or it could be flawed maintenance. Where we haven't done our verification and validation, good planning, et cetera. That could be from a hundred different reasons.
So the first thing we need to do is we need to perform a triage process where we're capturing the event learning as much about the symptomology of it, looking for a way to verify that it is not a false positive or a false negative either and gain as much insight as possible into the incipient action and symptoms and decide what we're going to do from there. Working from as much as we can know as possible.
So here's another look at our process. In triage, we are looking to detect, identify and there should be a first level notification. Something along the lines of letting management know that there is an event that's go going on, you're investigating, you don't know much at this particular moment, you've just learned of it, but you're on top of the situation and you'll get back as soon as you learn things. That at least gives them a heads up and that can be a very important heads up call.
Part of what we need to do is attempt to examine what's being affected in order to enforce operational priorities and then make some decisions about trade offs that might be necessary at this stage. Then we get into our investigative process. Here we do our analysis interpretation. We have a mitigating type of reaction that we're going to have now that we have a much better idea hopefully of what we're dealing with. And then we recover data. We recover operations to the extent possible. We certainly want to contain what's going on.
Some incidents start and then they expand engulfing more and resources. The sooner we can contain it the better. And then we have to put things into something of a steady state where we're analyzing what is going on, we're analyzing results, looking at key indicators, and then tracking what's happening to make sure that if it should break out again that we can respond quickly, rapidly, correctly, or that we can see that the containment effort we've gone to is working and that it's not expanding.
We get to our third stage, recovery. The dust is settled. We're getting things put back to normal. Now bear in mind, recovery is not resumption. We are simply gathering up the pieces, we're reassembling things, we're putting the legos so to speak back into the form they were in before they were broken up. Once we have finished the process of recovery, then the decision to resume will be considered and made. Reporting and control are part of how we communicate. And in cases like this, communication is one of the most important things we could do clear, timely fully informative.
Rapid response and escalation may have to be done very rapidly depending upon what the event actually is. We're going to have direction on how to contain, what to recover, priorities that we have to decide between or get direction on. So the thing to do is, proceed as quickly as we reasonably can based on what we know. We need to get this information into the hands of the appropriate decision maker. Clearly we have to have that individual identified long before the event occurs, because the quick action, the quicker we can get this thing under control and keep it from going further wrong.
We need to capture as much information as we can about the symptoms of what's going on. Now bear in mind this can be as a result of a hacking attempt or flawed maintenance but it doesn't really matter at this stage. We have an incident and we must deal with it. If it's maintenance that has gone wrong causing this problem or adding to it, then we're going to have to review those things, those processes and procedures at some point in an effort to bring them up to date or correct them.
If it's an outside attack, then we need to examine different elements. What led to this? What damage can it be doing to us? What flaws do they take advantage of and so on. But in either case, urgency needs to take place but it must be carefully executed so that it doesn't have the kind of haphazard impacts that blind urgency can bring about. We always have to strive for sufficient information so that we can get the best quality decisions and take the best quality action. And that means we have to prepare for this eventuality. So we have to establish an effective response capability to cope with it.
Now, incidents are inevitable. Regardless of source, they will happen. And we have to plan for this in advance. It's like buying insurance. Having insurance at the time that an event happens, you can rely on the insurance to help cover you and help you restore things. Trying to buy it at the moment or immediately following the arrival of an incident such as crashing a car, that's a point at which you're not gonna be able to get it. So to emphasize the necessity of advance preparation is to state the obvious.
Crafting and exercising a well designed incident response plan is vital. Having properly trained people, clear definitions, roles et cetera, and communication pathways are equally critical. Sometimes we are able to anticipate various events. If we know where our risks exist, if we know where an application maybe technologically fragile, these are things that we can highlight and anticipate to a degree. We can anticipate that there are going to be ransomware attacks or social engineering or other kinds. And we can take preventive and mitigative steps when it comes to those things. Some we can't such as zero days we know that they're going to happen, but being a zero day there's no way truly to prepare 'cause we have no idea what we're preparing for.
In all of these cases, we have to script out responses. We need to imagine scenarios. We need to upskill every one who is on the IRT to be able to respond rapidly and as appropriate as best we can in an appropriate way with urgency and yet careful consideration to respond properly. When we find that our containment efforts have succeeded, we then can take a breather and look at what corrective actions now need to follow to put things back into normal state and resolve these issues as best we can. But before a new problem can be resolved we must examine the situation to make sure that we actually have solved a problem. We have to establish it as real and a true problem.
Now, once this is done we're going to have to record the problem. And this is to distinguish between an incident which could result from the existence of a problem or a problem itself. A problem would be the difference between a light bulb blowing due to a defective lamp where it's a recurring incident. And we have to do root cause analysis to find out that the lamp itself is defective or an incident caused by a hacking attack. So we have to distinguish between those two things, because even though there may be similarities in the processes, the character of what we're dealing with is different.
Once we have identified something as a problem, we're going to have to document that, try to do as much root cause analysis as we can and escalate it for action as needed. Hopefully there will be a workaround of some kind because problems oftentimes take an extended period of time to resolve. Once we establish that there is one, we should tell the users that they have this available and they can elect whether they're going to use it or not. We should try to come up with a permanent solution.
Now permanent solutions are ideal but not necessarily, always possible. They may be impossible for a variety of reasons. Technically they won't work or the cost is prohibitive. But in any case, we need to find a solution. So we need to establish criticality, type, scope as early as we can so that we can start to make these decisions and recommendations. We can do it in the form of corrective, adaptive, preventive or strategic types of improvements.
Now in the scope, what we're considering is not only what the thing is, but what is the scope of the user population being affected by it. That in itself can have an impact on the urgency and priority that a given type of solution will get. If it affects everybody, it's probably going to be an important thing to do regardless of what it takes to get that done. If it's affecting a very small population, the reverse decision may be more likely. Not something that you will live with, but it maybe that it's more feasible to use a workaround instead.
But the facts that you gather are going to make evident what sort of solution path you're going to follow. Modification implementation needs to address the core issue to the extent that you can. And it should always be thought of as being risk or impact neutral. And that's something that you can only prove by having decided on a solution, testing it before it gets implemented into the operational environment.
Now, the maintenance that follows post-implementation is going to assure that the corrective action has in fact, and this is again, a case of verification and validation, has in fact resolved the situation and conforms to the organization processes and the software roadmap and the enterprise baseline. And this brings us back to the process of change management. Patching is a standard practice of applying remedial fixes to software. This is typically what they're doing. And they tend to be about security or functionality.
Now, when we think of a patch, we think of a small module of code that has to go in to close some opening, some vulnerability, some flaw in the software. And every time we're presented with the idea of implementing a patch, we need to consider two things. What happens if we do it? What happens if we don't? If we decide that we're going to go through with this we've decided that it's necessary, it's required, something else may depend upon it being there, whatever this patch might be. And we have to consider all that will be affected by the application of the patch. And hopefully we will determine now that we know that it's necessary, that it is risk and system neutral, that it will fix the thing but it won't raise any other issues on its own. If we decide or we're questioning whether we need to do this or not, we have to consider a couple of additional things.
First, something that comes later going to be dependent upon this. Sometimes that's an obvious thing, sometimes it isn't. If it's not something we need, meaning that relates to a feature that the software may have but that we don't use, then we may be in the attractive position of deciding that we're not going to do this and have a justifiable rationale for why we're not. But those decisions need to be made very carefully. We don't throw them away, never to be revisited because it may turn out that later on we really need to do that so that we can always go back and do that should that prove to be the case. But there is always a risk associated with doing it or not doing it and those need to be evaluated.
When we've decided we're going to move ahead, we need to take the patch, examine it carefully put it into a standalone host in an environment to illustrate what it does, test it and make sure that it does exactly what we think and nothing else, that it's truly is necessary and that it is risk neutral before we actually commit to the implementation.
One of the things we want to do for certain is to do regression testing and this is to make sure that this particular patch under consideration is not going to revive bugs that we've already killed. We want to be sure that they don't get resurrected. Now, DevOps has in recent years become quite popular. We want to present the modification where it's now called Secure DevOps or Dev SecOps more commonly.
Now this modification of DevOps typically brings together development, operations, the QA folks, and the security folks. And the idea of this whole thing is to move from cradle to grave for project, product and operations integration. It's very commonly associated with the agile philosophy of project management and a continuum of activities that implement that. So the structure is this. We have our development group. Typically they're the people who design, build, and prepare for deployment.
We have our ops folks. These are the ones who orchestrate the system for integration with the existing infrastructure and then perform the operations and maintenance tasks once it's in production operation. We have have our security folks, no longer the people that say no, they say, how can we, and they have to specify and be integrated in this process from the beginning to bring in the security architecture, the compliance architecture, and integrate this assurance structure with the developed product, advise there and then advise and operations. How it can all be nice and neatly put together. And we have our QA people that are going to do several functions throughout in all areas to make sure that quality of delivery and integrity of operation are maintained.
Now, this process is not something that you simply say we're going to adopt this and then magically you do. It can require a fairly significant cultural change in these different areas because organizationally, they are sometimes almost adversaries. But to get the real benefit out of this and of course the mission or the organization itself is going to benefit long run because by the integration where they routinely confer collaborate on the design, the ergonomics compliance et cetera, things that are involved in the program you get a much smoother, much more transparent and integrated operational continuum again from cradle to grave all the way along so that we can implement the things that we have been preaching from the very beginning, secured by design, secured by default, functional to specification, operationally stable and compatible and compliant with standards.
Mr. Leo has been in Information System for 38 years, and an Information Security professional for over 36 years. He has worked internationally as a Systems Analyst/Engineer, and as a Security and Privacy Consultant. His past employers include IBM, St. Luke’s Episcopal Hospital, Computer Sciences Corporation, and Rockwell International. A NASA contractor for 22 years, from 1998 to 2002 he was Director of Security Engineering and Chief Security Architect for Mission Control at the Johnson Space Center. From 2002 to 2006 Mr. Leo was the Director of Information Systems, and Chief Information Security Officer for the Managed Care Division of the University of Texas Medical Branch in Galveston, Texas.
Upon attaining his CISSP license in 1997, Mr. Leo joined ISC2 (a professional role) as Chairman of the Curriculum Development Committee, and served in this role until 2004. During this time, he formulated and directed the effort that produced what became and remains the standard curriculum used to train CISSP candidates worldwide. He has maintained his professional standards as a professional educator and has since trained and certified nearly 8500 CISSP candidates since 1998, and nearly 2500 in HIPAA compliance certification since 2004. Mr. leo is an ISC2 Certified Instructor.