CISSP: Domain 6 - Security Testing and Assessment - Module 2


The course is part of this learning path

Security Throughout the Development Life Cycle

This course is the 2nd of 3 modules of Domain 6 of the CISSP, covering Security Testing and Assessment.

Learning Objectives

The objectives of this course are to provide you with an understanding of:

  • System operation and maintenance
  • Software testing limitations
  • Common structural coverage
  • Definition based testing
  • Types of functional testing
  • Levels of development testing
  • Negative/misuse case testing
  • Interface testing
  • The role of the moderator
  • Information security continuous monitoring (ISCM)
  • Implementing and understanding metrics


Intended Audience

This course is designed for those looking to take the most in-demand information security professional certification currently available, the CISSP.


Any experience relating to information security would be advantageous, but not essential.  All topics discussed are thoroughly explained and presented in a way allowing the information to be absorbed by everyone, regardless of experience within the security field.


If you have thoughts or suggestions for this course, please contact Cloud Academy at


Welcome back to the Cloud Academy presentation of the CISSP exam preparation seminar. We're going to continue our discussion of Domain 6, Security Testing and Assessment. Security testing is, of course, something that needs to continue throughout the development life cycle. Only part of the job is to fix bugs and find vulnerabilities so that we can correct them as early in the cycle as possible. It's well known that security vulnerabilities discovered late in the development cycle are oftentimes much more costly compared to the cost of fixing them as early as we can find them, as much as 150 times more expensive with the break fix that occurs late in the cycle or even after production status is reached.

During the planning and design phases, we're going to go through a couple of different exercises. One will be threat modeling. This is a structured manual analysis of the application-specific business cases or usage scenarios. The analysis itself is guided by a set of precompiled security threats. We're looking at things beginning with business case, usage scenario, and we're trying to derive the benefits that come from understanding what our threat landscape looks like. This means identification of threats and their potential impacts so that we can identify effective countermeasures, again, as early in the cycle as we can do it.

We're also gonna conduct an architectural security review. This is a review of the product architecture to ensure that it fulfills the necessary security requirements presented by the design of the system. Our prerequisites for this are, of course, to come up with an architectural model that will guide the design and development and include with that an operational framework and security policy, bearing in mind that starting with these assumes that they will evolve and progressively elaborate over time.

Now, the benefits, of course, provide detection of any structural flaws or gaps through this review and we will validate the flow and policy compatibility. During the development process, we will engage in two different families of testing techniques. One will be static source code analysis and a manual code review. This analysis of the application source code is looking for vulnerabilities without having to actually execute the application.

The static binary code analysis and manual binary review takes place by the analysis of the compiled application in its binary, looking for vulnerabilities without actually executing the application as well. And in general this is similar to the source code analysis but it's not as precise, and fix recommendations typically cannot be provided.

When we get to the point where we're going to execute these in a test environment, we're going to draw from a few other types of tests. Manual or automated penetration testing which simulates how an attacker's sending data to the application and observes the behavior as that data is reached and acted upon. We have automated vulnerability scanners such as Qualys devices, which test the application for the use of a system component or configurations that are known to be insecure. This is looking for results and the effect by the cause.

We'll also use Fuzz testing tools which send random data, usually in larger chunks than expected, so that we can check the input channels of an application to revoke a crashing of the application. In looking for test results, what we're looking to do is determine how the failures will occur so that we can plan to either prevent them or ensure that we have proper error handling and recovery routines.

Our operation and maintenance, abbreviated O&M. We have, of course, our operation where the security testing techniques that we're going to use will be applied to ensure that the system configuration remains secure and that our assumptions, such as virus protection shall be installed or a correct authorization concept is implemented, are not violated accidentally.

Maintenance, of course, is the process of keeping everything running. This involves the security testing of patches as being a particularly important activity. The patches need to be security tested and tested thoroughly against all reasonably possible attacks and system configurations to which the patch can be applied. And this is to make sure that a customer who fixes bugs in their systems are not accidentally exposed to new vulnerabilities.

We have to, of course, consider that however good our tests are, however thoroughly they're planned and carefully they're thought out, tests themselves have limitations. Due to the complex nature of the interactions that will occur inside of our computer system software, it is not possible to reasonably expect to exhaustively test any application or system. Too many scenarios, too many interactions, too many variables. And software testing, and this is a common error, software testing that finds no errors should not be interpreted to mean that errors do not exist.

Now, to explain this a bit, it is often thought that a test that is run is successful if it doesn't find anything, and that's an incorrect assumption. The purpose of the test should be recalled. If I'm testing to find errors, if I don't find any errors, the test itself has actually failed. The reason for the failure may be that the test itself isn't comprehensive enough or is just designed poorly or doesn't go far enough in testing the environment in which it's found. We have to remember the purpose of our testing. We want to validate functionality, validate design features, we want to verify that how things are processed, in fact, will produce the results expected and that a test to find errors, in fact, finds them.

If a test doesn't find them, for example, that doesn't mean that there isn't any errors in it, it just simply means that the test didn't work in a way that might flesh out an error that does exist elsewhere in the application than the area that the test was looking for. So we have to be sure we remember the context in which we're doing these activities, what we're testing for, and what success and failure of our test really does mean. Because of this, we should always know exactly what the test should produce in the way of a result.

Key details that permit objective evaluations of the actual test result means that we know how it's going to turn out, what the result will be, so that we can value what we got within the scope of what, in fact, we actually received. Now, this is obtained from the corresponding or predefined definition of a particular attribute or specification of our component that we're testing. So in order to support a properly configured and operated testing project, we need to remember specific things about what we're doing. As I just said, the expected test outcome is predefined. In other words, we will know what the answer is supposed to be so that anything that deviates from that will reveal something about the condition that exists.

A good test case has a high probability of exposing an error. Again, taking into account the fact that the test covers the right thing and that we have enough tests that enough of the program is tested that we can reasonably expect we will have tested enough of the program in its functionality to know whether or not we can have high confidence that there is, in fact, no errors. 

A successful test, then, is one that finds an error. We always have to have independence from coding. It is not in our best interest or in the best interest of having the program that has resulted from our development efforts do what it is designed to do to have those who built it be the final eyes that see it and take it through its test. We need to have fresh eyes. We need to have a separation from those who designed and built, and those who do the testing to make sure that what is supposed to be is not in any way skewed or biased by the builder's bias. We need to do both application, that is, for the user, and software for the programming or the build expertise to make sure that we have enough visibility, enough scope.

Testers must use different tools than coders. Were we not to do that, we would, of course, derive exactly the same results that the coders derived, and that, of course, can mean that we can have blind spots. Examining the usual case only is insufficient. If we look at only those cases that the program is intended to serve, that leaves an entire category - misuse cases - completely unexplored.

We must, of course, document our tests so that we can reuse them. The tests themselves occasionally must also be tested to ensure, like any tool that needs calibration, the test actually does what it is expected to do so it will produce the results that we expect it to produce. We're going to do testing at different levels. There will be the unit test, typically done at a couple of different points. The builders of the unit will test it on the bench and then it will be turned over to the testing group to test the individual unit. We will then, of course, combine these units and do system-level testing, first at the thread or process level, then, as we combine these into the complete system, we continually grow the testing effort to test the broader and longer range as more and more completeness is brought to the system, until ultimately the entire system is brought together and we do a complete end-to-end test.

We have to have test cases that are suitable for each level and some of those test cases will have to be code-based testing where we can make sure that the code does what and only what it is supposed to do in each use case. And as always, we're going to have to have coverage metrics so that we know how to measure our rate of success or failure. We have to know that we have tested everything that we should have or need to and that it measures up to the various metrics that we have decided on to illustrate the points of success or failure.

Now, areas that the coverage metrics need to be applied to are these. We have statements where, for example, this criteria requires sufficient test cases for each program statement to be executed at least once. That in itself indicates that the statement has been programmed so that it will run and complete. However, the achievement of this is insufficient to provide confidence in the software product's behavior. It is, after all, only a single statement giving only a very tiny snapshot of the total functionality. So within its scope, this kind of coverage is necessary, but to try to extrapolate the results to a much broader scope would be a mistake.

We have condition coverage and this criteria requires sufficient test cases for each condition in a program decision to take on all possible outcomes at least once. That is, if we have a condition that can have four different decision points, we need to be able to test each one through to a completion, again, to make sure that it will run through to a completion and then to make sure that the logic that it follows as it arrives at each decision point is also valid. This differs from branch coverage only when multiple conditions must be evaluated to reach a decision, which means, of course, we must do branching coverage as well.

This criteria requires sufficient test cases for each program decision or branch to be executed so that each possible outcome occurs at least once. It is considered to be a minimum level of coverage for most software products, but decision coverage alone is insufficient for high integrity applications. Thus we get to multi-condition coverage. This criteria requires sufficient test cases to exercise all possible combinations of conditions in a program decision. So as you see from this collection, statement, condition, decision, or branch coverage and then multi-condition, we go from the most simple to the most complex, testing each one to make sure that it will complete properly, which displays the mechanics had been put together in the right way, and then we have to test to make sure that each possible combination to produce an outcome can be tested at least once.

Now, the structural coverage covers things like loops, pathways, and data flow. The loop requires sufficient test cases for program loops to be executed for zero, one, two, and many iterations covering initialization, typical running, and termination. In loop coverage, one of the things we have to test for is to make sure that at some point, the loop does not get locked and the program continues to execute on into infinity, that at some point the loop terminates.

We have to do path coverage, making sure that it requires the test cases for each feasible path, basis path, from start to exit of a defined program segment, again, to be executed at least once to show that it will work. Because of the very large number of possible pathways through a software program (and these, of course, should be diagrammed as the architecture is laid out), path coverage is generally not achievable, from the standpoint that there may be too many variables again.

Data flow coverage requires sufficient test cases for each feasible data flow to be executed at least once. Now, definition-based testing is one of the things that must occur at a lot of different levels. It goes by other names as well. This is specification testing, functional testing, or black box testing, where what we're looking to do is establish that the module being tested, by the result that it produces, is doing the job it is intended to do. One example would be testing a cryptographic module.

Ideally, because of the random nature of most of the characteristics inside of an encryption algorithm, there is no one and no way, or shouldn't be, to predict how something is going to work and how it's going to produce any kind of an output from any kind of an input. Were that possible, the encryption algorithm systems would be of no use whatever. Because if we, the builders and implementers of this, to protect our information, can do that, so can our adversaries. So we judge its success by the result that it produces.

Now, we identify the test cases on the definition of what it is intended to do and judge the results accordingly. Then we also have the functional testing, and we're going to do this to test for normal case outputs, output forcing, a combination of the inputs, and then robustness. In the normal case, of course, we're going to employ usual inputs to produce what we believe will be the normal case.

For output forcing, we're going to choose test inputs to ensure that all or selected software outputs are generated for the testing. For robustness, we're going to do software testing that demonstrates that a software product behaves correctly when given unexpected invalid inputs. Methods for identifying a sufficient set of such test cases include things like equivalence class partitioning, boundary value analysis, and others. While important and necessary, these techniques do not ensure that all of the most appropriate challenges to a software product have been identified for testing. And then from a combination of inputs, this functional testing method identifies that all emphasized and individual test inputs have been used.

Most software products operate with multiple inputs under their conditions of use. Thorough software product testing should consider the combinations of inputs a software unit or system may encounter during operation. Cause and effect graphing is one functional software testing technique that systematically identifies combinations of inputs to a software product for inclusion and test cases.

So when it comes to test case identification techniques, we want to look at a couple of different ways. We're going to do functional and structural software test case identification techniques. These will provide specific inputs for testing rather than random test inputs. Now, the weakness of this comes from trying to link structural and functional test completion criteria to the software product's reliability and testing software changes, which is something that occurs all too often in today's world. These changes will frequently occur during software development. That is, they will happen during the development cycle before it reaches production level.

Now, these changes are, of course, as a result of debugging or requirements creep where during the development cycle those who design, those who build, those who aren't the planned users think of other things that they want the program in question to do. This is an area that requires that we manage the requirements very tightly to ensure that we don't deviate from the intended path even while we may from time to time include new changes or adaptations.

When the design is modified, through either a design repurposing, adding additional functionality or removing something, we have to be sure that whatever is the result that we test for the cause and effect that will be produced by whatever the change is. One form of testing is regression analysis. Now, regression analysis is a technique that is intended to provide assurance that a change that has been put in has not created new problems or resurrected any elsewhere in the software product that have been fixed at some point in the past. The determination of impact of a change based on a review of the relevant documentation to identify the necessary regression tests to be run is something that must be done so that changes that have been added in the past are known to remain silent with any change that comes following it.

Think of putting up wall paper. A regression analysis test is like putting up wall paper, where there's a bubble that appears in one spot, and then by pushing that one down another one appears in another part of the wall paper. Regression analysis is an attempt to make sure that when bugs have been fixed, they stay fixed, and changes that follow it are risk-neutral, do not create new issues and don't resurrect old ones.

So again, we're going to do tests at various levels. The unit tests, individual units tested in a stand alone mode. Integration test, where units are brought together to complete a process and then the sequence of units are tested as they are integrated together. Then as we integrate the individual processes into the entire system as we progressively add, we test and test and test until we reach the full system integration for the production version. Then we do an end-to-end test to ensure that the test cases that are run prove that the system in its entirety works as designed.

When we get to the system-level testing, various things are going to be explored. We're going to look at security and privacy performance over the entire scope of system processing. We're going to look at performance issues making sure that resource and processing times are kept to the proper minimums. Responses to stress conditions, which means a variation in inputs to produce a variation in outputs and to test the system's ability to handle the unexpected.

We're going to look at the operation of internal and external security features, checking the interaction and checking their completeness and performance. We're going to put it through a series of tests to check the effectiveness of the recovered procedures, error recovery, and even systems-level recovery.

We, of course, want to be sure that we check on the ergonomics and the usability from the user's perspective. We have to check it in the intended production environment to make sure that it has compatibility with the other software products that it may interact with or cohabitate in the system environment. We have to look at its behavior in each of the defined hardware configurations where the system is likely to find itself. We have to look at the accuracy and currency of the documentation to make sure that everything that is necessary for the proper end user experience is there to support the success with them.

Controls must be documented as well, and controls must have a range of function that can be examined in our testing and then employed by the users as it moves from one environment to another, one stage of development or maturity to another, and that they continue to evolve with the system as it evolves.

Testing must be located properly within the system environment. System-level testing exhibits the software product's behavior in the intended operating environment, and the location testing is dependent upon the ability to produce the target operating environment so that the in-context testing that this represents produces results that can be replicated in the real world with great reliability.

So our test results are there to help us document procedures, data, and results in a manner permitting objective pass/fail decisions to be reached. Our test results, therefore, must always be examined in context. They must be examined against other factors to produce the most real and rational decisions about them. We will use a variety of software testing tools throughout all of these different phases and cycles. Some of the tools will be built by us in house. Some will be acquired from commercial sources. And every properly configured testing program should be a combination of both, and like precision tools, we have to be sure that if the tools that we either build or acquire are tested, to make sure that we calibrate them and that they provide true and authentic results when they're run.

As we go through and test everything, we have to be sure that we have a clear sense of what the appropriate validation effort to be exerted in any particular area should be. We have to determine the specific validation effort necessary so that we can properly measure the type of change, any development products that might be affected, and the impact that those products will have on the operation of the software. Sometimes the appropriate validation effort can be minimal. Sometimes it must be extensive. But by not testing things to the appropriate level, we either over expend or we under explore. Either way, the results that we have from either case would be suspect.

About the Author
Learning Paths

Mr. Leo has been in Information System for 38 years, and an Information Security professional for over 36 years.  He has worked internationally as a Systems Analyst/Engineer, and as a Security and Privacy Consultant.  His past employers include IBM, St. Luke’s Episcopal Hospital, Computer Sciences Corporation, and Rockwell International.  A NASA contractor for 22 years, from 1998 to 2002 he was Director of Security Engineering and Chief Security Architect for Mission Control at the Johnson Space Center.  From 2002 to 2006 Mr. Leo was the Director of Information Systems, and Chief Information Security Officer for the Managed Care Division of the University of Texas Medical Branch in Galveston, Texas.


Upon attaining his CISSP license in 1997, Mr. Leo joined ISC2 (a professional role) as Chairman of the Curriculum Development Committee, and served in this role until 2004.   During this time, he formulated and directed the effort that produced what became and remains the standard curriculum used to train CISSP candidates worldwide.  He has maintained his professional standards as a professional educator and has since trained and certified nearly 8500 CISSP candidates since 1998, and nearly 2500 in HIPAA compliance certification since 2004.  Mr. leo is an ISC2 Certified Instructor.