This course is the second module of Domain 8 of the CISSP, covering Software Development Security.
Learning Objectives
The objectives of this course are to provide you with an understanding of:
- The database environment
- Software development and the world of the web
Intended Audience
This course is designed for those looking to take the most in-demand information security professional certification currently available, the CISSP.
Prerequisites
Any experience relating to information security would be advantageous, but not essential. All topics discussed are thoroughly explained and presented in a way allowing the information to be absorbed by everyone, regardless of experience within the security field.
Feedback
If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.
Welcome back to the Cloud Academy presentation of the CISSP Exam Preparation Review Seminar. We're continuing our discussion of Domain 8 Software Development Security, and we're starting this particular one on section three. So, we're looking at database management systems. The thing we're going to begin with is the architecture. Now, it's a common understanding that a database management system involves a core set of functions and a suite of application programs altogether that manage a large amount of structured, and in fact are providing the structure for, sets of volatile and persistent, though through proper means, changeable data. The database management system then stores, maintains, and provides access to this data using a form of ad hoc query or stored views.
So the components of a database management system include the database engine itself, which manages the storage where the data elements are kept, and the manipulation processes through the various functions. It has the supporting platform, which could be a virtual machine environment, a disc-optimized environment, or one where the entire database is loaded into memory for faster performance. We, of course, have the application software that rests over the top of this acting as the manner in which the data is entered, manipulated, changed, and so forth, through application program interfaces and prepared queries known as views. Then we have the user interfaces and the various activities that users will perform through them.
Now, there are some things that the database has that characterize it apart from all other forms of software: the relationship between the data elements and a framework for organizing the data to manage those relationships. It does this through transaction persistence, which is the quality that the state of the database itself remains consistent and unchanged, except for those elements specifically affected by the transactions, and that the transaction itself is durable, if done correctly and by the rules of the database. Fault tolerance and recovery is itself a quality that following any failure the database remains in its original state. In other words, a fault when taken does not destroy the database itself and the database structure as a way of isolating the failed condition, so that the database itself is preserved in its original state, and that these transactions recover either through a rollback, in other words, backing the transaction out, or by shadowing, which is a form of reapplication at some point after a database has been restored to an accepting condition.
These systems are, of course, meant to be shared by multiple users. And there is a control set that governs multiple concurrent users in order to ensure that processing conditions and the order in which these are done, such that the integrity of the data is protected. These, of crouse, must have security controls built into them, and the set of control features used to assure authorized users, authorized change privileges for those users, while preventing any unauthorized activities by any authorized or unauthorized user, integrity assurance measures and various forms of view definitions.
Databases, like so many other things in computing, have gone in generations. In terms of the database, the hierarchical database management model is probably the oldest. Now, in it, there is stored in a series of records with field values, as you see in the graphic, with parent-child, or grandparent-parent-child, and so forth. All of these collects all the instances of a specific record together as a record type. And through these tree-type relationships, we find that searching and retrieving data can be very quick. It is, however, a rather inflexible type of a model, and so its first stage of evolution was to move to a network database model.
Now, the network database management model represents data in the form of a network of records and sets that are related to each other. The records are the equivalent of rows in a relational model. And the record types are sets of records of the same type. And, as you see, in the diagram next to the text, you have a network wherein taking a path can have multiple connecting paths, rather than the straightforward hierarchical nature of the hierarchical-style database. From there, we move on to what is the current standard: a relational database model. Now, it should be said that not every kind of database is suitable for every kind of data management. That's a way of saying that a hierarchical sort of data is not necessarily better manipulated by a relational and the reverse is equally true. But data has many different types of relationships that can exist between it and other data elements, and that sort of data is best modeled through a relational type.
Now, relational databases are put together based on set theory and predicate logic. To explain, a predicate is a verb phrase template that describes a property of objects or a relationship among objects represented by variables. And Set Theory refers to collections of things and the characteristics by which they are grouped and accessed. Now, the database engine enforces the kind of logic that you see, such that it provides a very high level of abstraction. This, of crouse, is facilitated through the creation of a vehicle, or a GUI, some form of user interface, that enables a more convenient interaction for the person.
In doing this, it creates an ergonomic interface that enables enhanced control possibilities over a common command-line interface. So the elements of the relational model include tables or relations, and these are called base relations. They look very much like a two-dimensional matrix. There are, of course, integrity rules, rules that govern how the data can be changed, what data is valid, syntactic or semantic integrity, and then data manipulation agents, which are the tools used inside the database to change the data in authorized ways and prevent unauthorized changes. Now, the attributes of a table include these. The attributes themselves are columns in a table. A tuple of a table corresponds to a row and between the two they create the two-dimensional matrix representation. Now, in these we have to, of course, find a way to search and retrieve data. This is done through the use of keys.
We have primary keys, which are the only guaranteed method of pinpointing an individual tuple. This key, therefore, is the first original location pointer to the first original location of the tuple itself. When the primary key of one relation is used as an attribute in another relation, it is called a foreign key in that relation. So the foreign key is a replication of a pointer back to the location of the primary, which designates the original location of the tuple being referenced. This reference pointer then points back to that, so instead of replicating and creating a new instance of the original tuple, we simply place a pointer, the foreign key, pointing back to it.
Now, most relational-type databases are manipulated through some form of SQL, or Structured Query Language. This is a language by which users issue commands and various other statements in order to manipulate the database, either the structure or the data within it. It has major components which you see here, the schema which functions as a blueprint of a plan or the model of the data. It has, of course, the tables or base relations, as we said a moment ago, which are the two-dimensional arrays of data elements. And then views, which are presentation constructs showing the data elements and their relations. Now, SQL has a number of different subsets. On the next slide, we'll talk about three of those.
So the language subsets of SQL include a Data Definition Language and the command words CREATE, DROP, ALTER, and RENAME. We have the Data Manipulation Language, which includes SELECT, INSERT, UPDATE, or DELETE. And then the Data Control Language typically about the rules of who can access what through GRANT and REVOKE. Now, IT and the technology, of course, are in continual evolution. The relational database has evolved, therefore, into a type known as Object-Oriented Database Models. Now, these are the more recent database models, and they store data as objects as opposed to singular data elements. And the data item itself and its metadata, therefore, are referenced and stored together.
Now, where we find these in common usage, storing images, animations, music, video, or any form of mixed media, is typically within something like a media-type of program, like Windows Media Player or RealTime or iTunes. Now, databases are, of course, of different types, different styles, different themes on the idea of a relational database. They can be accessed through on-prem systems with built-in GUIs for the on-prem, or they can be accessed over the web. But they have their own structure based on what the manufacturer of that particular program feels is the best way to achieve their goals and differentiate their product. Consequently, there can be many different ways of getting into a database, even though they're all based around the same construct of the relational, so we have different ways of bringing these things together and making more homogeneity in how we can do that.
One of the primary ways, still in great and wide use, is ODBC or Open Database Connectivity. This is an application, an API, built by Microsoft that allows applications to link to these databases, while preventing or allowing only minimal DBMS-native code changes. So it simplifies the building of a web-based interface to a variety of different backend databases. In the world of Java, we have its equivalent, the JDBC or Java Database Connectivity, which is analogue to ODBC that connects Java apps to databases, and this can be used with or without incorporating ODBC access. Most of these things are, of crouse, constructed in XML, the eXtensible Markup Language, which contains data structure standards that share both format and the data within it.
We have the Object Linking and Embedding Database, or OLE DB, which allows for the copying and pasting of hotlinks of one application to within another one, such that the one embedded in the larger application can be accessed and activated from within it. And then we have the Microsoft ActiveX Data Objects, which is a nearly agnostic interface mechanism to access data. But it has the drawback in the minds of many of being without any inherent configurable restrictions, and so the control itself must be built to much greater rigor, because there are no particular ways of configuring it differently.
One element that also can exist within these database environments is the metadata, literally data about data. Now, the metadata itself is very useful, and as such it can be as sensitive in its own way as the data itself that it represents. It is valuable information about the unseen relationships between data elements. Through this, we have the ability to correlate data that was previously considered unrelated to each other. And it provides keys to unlocking critical or highly important data inside the data warehouse itself.
Now, it's easy to see that since the database is where the data itself is located, thus more or less being the Fort Knox of information, that it's going to come under attack. Much of the data that is held in there is considered very valuable by its owning entity. This, of course, renders the database subject to various kinds of attacks, both from insiders and from outsiders. So on this slide, we're going to look at some specific threats and vulnerabilities that give rise to these attacks. Here we see a few of those. We have inference, which literally put, is the deducing of the unknown from the observable. Now, this means that a person seeing things on the screen may be able to determine that other things that they cannot see are located within the database. This might give them the ability to launch queries to learn, if in fact, what they imagine to be in the database deduced from what they can see, and the queries when their answers are returned with a negative response saying, no, what you're looking for is not here, which confirms that they no longer have to spend time looking for that, and a positive response confirming what they suspect as being present is in fact in there.
So either positive or negative responses following up to the inferences they make are both helpful to the attacker. Following this would be an aggregation style of an attack. The attacker would combine various elements, each one of which they might have a clearance for or access to, but once combined with other pieces, the resulting whole may produce something of greater sensitivity that is actually beyond their access or authority.
They can, of course, perform a bypass attack, which is an access control violation by circumventing them. They can, of course, do a view compromise, which is also an access violation causing a view to be able to view things that they're not supposed to be able to through unauthorized modification of the view. There is a concurrency style of an attack, which could be one that is launched by an attacker, or it could be one that results from a failure of timing controls within a database.
It produces a race condition, which is a resource violation, or in the end, a deadlock, where the database itself is locked up, because two competing transactions are vying for the same resource, the one having what the other needs, thus locking the database causing a restart and possibly destroying or corrupting data that may be in the middle of a transaction cycle. There is, of course, contamination coming in a variety of forms, and this is a form of integrity spoilage. Some of these attacks can also result in a denial-of-service.
A deadlock, for example, results in a denial-of-service, because the database itself locks, refuses to go forward, and do any further processing until it's been shut down and started back up. So this is one that can be caused by an outsider or by a failure of inside controls to try to prevent these things in terms of timing.
Now, a database, like every part of an IT system has to have proper controls in place, making sure that only authorized users can make only authorized changes. And this will, of course, involve forms of identification, authentication, authorization, and other forms of access controls such as grant, revoke, that we have inside databases in particular. One of the ways that we try to control how a database works, such that it works in a first-come, first-serve, or FIFO kind of arrangement, first-in, first-out, we have lock controls, so that an individual getting access to a data element and getting to it first may get read and write access where everyone after that point may get read-only.
Now, these locks are used for read and write access to specific rows of data in the relational system or objects in object-oriented. And they have to follow a particular model, which we call the ACID test. ACID stands for atomicity, consistency, isolation, and durability. Now, atomicity means all or none, that is, that the resource state management of breaking up the transaction into several different input elements through input boxes specifies the shorter duration for locking up a table while the element is put in. It also makes one set of input boxes in the database transaction dependent upon others, such that those that are required and those that are optional have to be filled out in a certain way before the transaction will be allowed to proceed to the next step.
Once it has met the all or none—none being that the transaction stops, all being that it's allowed to go forward—then it moves on to consistency. Consistency enforces that all changes maintain consistency of the databases, which means it enforces rules of formatting and content for anything put into a field. So having successfully passed atomicity, it then must pass consistency, and successfully passing that moves into isolation, which means the transaction has been accepted for processing.
The pending transactions in processing are now unobservable to anybody, including the submitting user. By making these unobservable, they cannot see the results until it finishes successfully. When it does so this is then called durable, and this is established as a final form update state. And so the acronym ACID, even though it stands for atomicity, consistency, isolation, and durability, is often thought to stand for all changes are invisible until done. We have, of course, a set of other DBMS access controls. This includes view-based access controls, where a user is given a view and is able to access only the data, and see it only in the way that view is constructed to present.
There are the grant-revoke access controls to give and resend the access. There is security for object-oriented databases, such that all the controls make sure that the data and its metadata are accessed, or that if one cannot be, that neither can be. There is the metadata control as to whether or not anyone can see the metadata. And then through format and content checking, protection against data contamination.
Databases are, of course, frequently behind the online transaction processing. Now, online transaction processing systems facilitate and manage the various forms of transaction-oriented applications that are in very widespread use. But these systems, by their very nature, are subject to a couple of types of attacks. The first one is concurrency. Now, concurrency failures can result in transactions being processed out of order producing potential race conditions, which are resource conflicts, or deadlock situations, which result in the system stopping having to be restarted, which threatens data integrity or loss.
Failures of atomicity can result in cascading process faults or the reverse case, which is where an early step failure does not, as required, prevent the total transaction stream from completing, when in fact what it should do is stop it.
Now, when we combine a number of different databases into a data warehouse, which has another layer of software creating the warehouse function on top of the actual databases themselves, and then having drivers for the individual databases feeding into this application known as the data warehouse, these are used for knowledge discovery. By taking the data of one database, and relating it to the data that may exist in other databases, through mathematical, statistical, and other forms of visualization, we're able to identify valid and useful patterns in the data now that we have multiple repositories to draw from.
Additional relationships that can be visualized would not have been seen, except through going through this method. As such, the data through all of the databases, and through the data warehouse itself has to be protected in order to make sure that it has the value and the integrity that this data needs. So the security controls will have to protect the knowledge base itself, both the original database and the knowledge base that is created as metadata extracted from all of these different sources. The logic that is used to create decisions must be verified to make sure that the presumptions that are going into these can be validated. All the rules that enable this knowledge discovery have to be validated going through a formal change process. We can create additional and different queries that will validate and verify the information, such that we are testing to make sure that our assumptions about these are true. The knowledge that we extract from these different sources can go into making risk management decisions about a variety of topics. And from it all, we can develop a baseline of expected performance from the analytical tool itself.
Mr. Leo has been in Information System for 38 years, and an Information Security professional for over 36 years. He has worked internationally as a Systems Analyst/Engineer, and as a Security and Privacy Consultant. His past employers include IBM, St. Luke’s Episcopal Hospital, Computer Sciences Corporation, and Rockwell International. A NASA contractor for 22 years, from 1998 to 2002 he was Director of Security Engineering and Chief Security Architect for Mission Control at the Johnson Space Center. From 2002 to 2006 Mr. Leo was the Director of Information Systems, and Chief Information Security Officer for the Managed Care Division of the University of Texas Medical Branch in Galveston, Texas.
Upon attaining his CISSP license in 1997, Mr. Leo joined ISC2 (a professional role) as Chairman of the Curriculum Development Committee, and served in this role until 2004. During this time, he formulated and directed the effort that produced what became and remains the standard curriculum used to train CISSP candidates worldwide. He has maintained his professional standards as a professional educator and has since trained and certified nearly 8500 CISSP candidates since 1998, and nearly 2500 in HIPAA compliance certification since 2004. Mr. leo is an ISC2 Certified Instructor.