CSSLP Domain 2:2 Data Classification & Categorization


Data Classification & Categorization
Data Ownership
1m 13s

The course is part of this learning path

Data Classification & Categorization - Introduction

This course is the second installment of three courses covering Domain 2 of the CSSLP, covering the topic of data classification and categorization.

Learning Objectives

  • Understand the fundamentals of data classification and categorization
  • Learn about the security implications of data ownership and labeling
  • Learn about different data types and the data lifecycle

Intended Audience

This course is designed for those looking to take the Certified Secure Software Lifecycle Professional (CSSLP)​ certification, or for anyone interested in the topics it covers.


Any experience relating to information security would be advantageous, but not essential. All topics discussed are thoroughly explained and presented in a way allowing the information to be absorbed by everyone, regardless of experience within the security field.


If you have thoughts or suggestions for this course, please contact Cloud Academy at


We're going to continue now with section two of domain two. This section is gonna be about data classification and categorization and we're going to speak about the subjects of data classification, ownership, labeling, types of data and revisit the data life cycle. Now, for software to be secure and resilient against hackers, it must take into account certain foundational concepts of information security. And as you've seen, these include confidentiality, integrity, availability, authentication, authorization, accountability and management of sessions, exceptions, errors, and configuration parameters.

Now, it's obvious that data has to be considered the most valuable asset that the company has, second only of course to the intellectual property represented by its workforce. Like any asset, this warrants protection. Data as a digital asset needs to be protected as well. Now, one of the key elements in security of course, will be identifying which assets, including information, are most critical from a security point of view.

Data is one of the key elements in the enterprise. It is the item that many criminals seek when breaking into systems so that they can retrieve it and monetize it. It has both tangible and intangible value to the enterprise. Now, managing this asset portfolio of data is an interesting challenge.

First off, it is largely intangible despite the fact that it can produce very tangible monetization results for the criminal elements. We always have to ask the question however, how do we value it? How do we then determine how much protection is needed? What is that, quote unquote, right amount? And along with that, we have to decide who in fact are we protecting this data from.

Now, as we've been talking about requirements, what we need to do is make sure that we develop requirements around data protection by characterizing the data as best we can and by including how it will be processed by whom and from where in the requirements. We need to identify these from internal and external and the requirements are going to be derived from both and in some cases, harmonized between the two. Now, the purpose of meeting these really needs to be described in great detail. We need to map them out so that we know what they are, where they come from and of course, break them down so we're able to construct an approach to meet them.

Each of our requirements must have a specific objective and we must, along with that, define the metrics so that we know that we have or we have missed the mark and how to get back there and achieve it. And that will include all the different objectives we've mentioned so far, plus many others.

Now, the details and specifications of the development work can then be crafted to ensure that the software integrates the necessary functions to perform as required and that the outputs will satisfy the internal and external requirements sets. Now, in order to really understand what our data is and what value it contributes to our operation, we need to think about data states because the data state is going to be either a way of making it more available to a hostile source or to us, or less available to either one, preferably less available to the hostile source and more available to us. But that, again, highlights the state of balance that has to be attained in the course of building our requirements up.

Now, as we know, the data will occupy one of many different states. Typically, we define these as at-rest, which would be objects that are closed or in some form of quiescent state, not in use or in motion, the in-motion, which means they're being transmitted between locations, possibly memory locations or geographic locations, or they may be in-use, objects being created, modified or eliminated and under the direct control and supervision preferably of an authorized person.

In certain cases, we have to consider the medium involved such as when it's at rest, it could be on some storage medium, optical, magnetic, flash, RAM, ROM, or when it's in motion through optical media or through electrical media, such as copper wire. Now, the security requirements concerning these must address the characteristics of the medium as well as the data itself in order to ensure that the protection of the data is consistent with the states, consistent with the medium and that it integrates with the operations being performed and is cognizant of the qualities of the particular medium on which the data is found and what that contributes to the overall question of how to protect it.

Obviously, we have to consider the data functionality. This of course is defined as the intended use of the data and this becomes one of the most essential characteristics required to correctly determine how we classify this data, which will translate into who is going to get access and at what level. Now, these definitions capture how the system or the business will utilize and share the data. If it's internal, it may be used within the system itself, something that is created or processed within. if it is an input, it is stored in the system either by human input or by some other means of transfer. If it's output, that would be data produced in some form by the processing the system or application is doing and as a product, would be made available for business use. Data security indicators should be assigned to each form of data that we have, whether it's obfuscated, meaning hidden by some method to prevent unauthorized disclosure, personally identifiable, which means it has a regulatory impact and it's considered one of those privacy types of data, not simply the confidentiality type, and a sensitivity indicator, typically something that might read high, medium and low or some numeric assignment reflecting potential compromise or impact.

Speaking of risk impact, this is one aspect that we must ascern as clearly as possible. Now, this would be a mandatory attribute that we use to determine the impact on the data of some risk element that also produces an impact to the business from that data is being impacted by that threat element. This would involve data functionality or the intended use that imparts critical, non-critical attributes and elements to the data. We have to consider data acquisition, the usage and maintenance processes which reflect the cost to acquire or produce, the cost to store and protect, the reproducibility or replaceability of the data.

Sometimes data is so unique that it's very difficult to find it or replace it or recreate it. And therefore, it must be protected in certain special ways, possibly above and beyond what we might do for other elements that are quite readily replaceable or reproducible. We have to consider the regulatory implications if it is compromised, such as a breach. This is typically something that has to be reported to authorities and the individual that may be affected or individuals, they have to be notified. This involves a cost to do that and it adds to the cost of having the data in our possession.

We must consider the relative importance to processing and impact from failures. We oftentimes find ourselves considering all of the outside forces, all the hostile parties, hackers, international criminal groups, nation states and others. We sometimes forget that we have to consider normal failures of these machines and systems and what contributes they make to the overall cost of protection use and saving of this data. Now, one of the elements that we have to use to protect the data is of course, labeling.

Now, this will provide a visible description and a system readable attribute for access control mediation. There's regularly a debate about whether or not data labeling, meaning the honest representation of what the data is, how it's used and other attributes, what that contributes to the possible compromise and incentive that a hostile party might have for a particular data object. The fact of the matter is, as long as we are doing everything that we must to prevent unauthorized access and by ensuring that all authorized users have access to the data and its attributes so that they know the proper usage and the proper protection means that govern how they're going to interact with the particular data element, then direct, clear labeling becomes an obvious necessity.

Typically, the hostile parties are not going to take serious notice of what a data label is on the presumption that it's probably been falsified or exaggerated in some way, possibly as deception technology, possibly as just poor computing hygiene. But it is to our benefit to label it correctly, handle it correctly in those ways and to make certain or as certain as we can that only authorized users are able to gain access to it in the first place.

About the Author
Learning Paths

Mr. Leo has been in Information System for 38 years, and an Information Security professional for over 36 years.  He has worked internationally as a Systems Analyst/Engineer, and as a Security and Privacy Consultant.  His past employers include IBM, St. Luke’s Episcopal Hospital, Computer Sciences Corporation, and Rockwell International.  A NASA contractor for 22 years, from 1998 to 2002 he was Director of Security Engineering and Chief Security Architect for Mission Control at the Johnson Space Center.  From 2002 to 2006 Mr. Leo was the Director of Information Systems, and Chief Information Security Officer for the Managed Care Division of the University of Texas Medical Branch in Galveston, Texas.


Upon attaining his CISSP license in 1997, Mr. Leo joined ISC2 (a professional role) as Chairman of the Curriculum Development Committee, and served in this role until 2004.   During this time, he formulated and directed the effort that produced what became and remains the standard curriculum used to train CISSP candidates worldwide.  He has maintained his professional standards as a professional educator and has since trained and certified nearly 8500 CISSP candidates since 1998, and nearly 2500 in HIPAA compliance certification since 2004.  Mr. leo is an ISC2 Certified Instructor.