1h 9m

As companies move more and more mission-critical information and workloads onto the AWS cloud, DevOps Engineers need to implement more sophisticated and secure methods of managing data and processes. This course on Governance on AWS teaches skills to manage complexity and direction on increasingly large AWS cloud accounts and installations.

Learn how to: 
- Model the resource governance and compliance life cycles
- Inventory the actors, actresses and action triples in governance
- Use AWS CloudTrail for help with AWS API call audits
- Use CloudWatch Alarms and Metrics for Billing
- Use AWS Config Rules and Timelines
- Practice good management of IAM credentials and policy structures

If you have thoughts or suggestions for this course, please contact Cloud Academy at


Welcome to the Cloud Academy course on Governance on AWS. In this lecture we're going to be doing an introduction to governance, in which we'll be talk about number of topics, including how we define governance, common focuses of governance and how we define the different goals that we're going to try to achieve while undergoing governance processes, talk about some of the modern challenges associated with doing governance in a cloud like Amazon Web Services. We'll talk about different steps that need to be taken to ensure that we are following governance process. We'll talk about different resources that we can govern in different objects or mental models that we'll be operation over when we think about governance. And then we'll do a brief summary of what we're going to learn in the course.

So moving right along into how we define governance. Information and technology... I'm just kidding. I'm not actually going to read you this definition, though you can stop the video and read the entire thing. This is actually the definition that you might see on Wikipedia or on a formal definition that you might find on another technology blog. So we're going to operate off of a different definition that is a little more concise and helps us actually understand what's going on. Managing IT resources and processes to meet business requirements. Okay, so that sounds pretty generic, and it is. So governance at the very high level is just making sure that IT of the processes and resources, so realizing that we're not just talking about the individual software systems, but also the business processes associated with managing the IT, are meeting business requirements at large. So, this breaks down into a number of different things when we start looking at larger businesses or different companies with more stringent software requirements.

So let's look at some common focuses that this actually pans out into. And the way that this actually gets implemented. So one, we generally, when we're doing governance, we think of assuring that the use of information and technology generate business value. So this is making sure that anything that you're doing on your technical systems or in your IT group is actually generating business value and not just wasting money. At a small, nimble startup, you might think, how could we possibly just be randomly wasting money? But at larger organizations, you have to realize that if you're not familiar with that environment, this falls into the realm of if we are accidentally running EC2 instances, or different AWS services and we're consuming capacity that's being unused, simply because the scale and the diversity of all of the things that I'm using on Amazon are so wide and so deep. So realizing that beyond just making sure that we're not actively pursuing silly initiatives that don't generate business value, we also need to realize that this bullet number one in governance means that we need to take inventory of the different information and technology that is out there, and make sure that we don't have anything sitting idle or being unused that's costing us money.

So we also need to look at bullet number two and realize that we're overseeing and ensuring IT worker management and performance. The oversee part here; we can imagine that would be something like running an audit on the different ways that people are accessing an AWS account. So we should immediately be thinking about AWS IAM at this point, but there's a latter half to the sentence where we say we're ensuring worker and management performance, so these are different metrics that we might be able to extract from our Amazon account around all kinds of different facets of usage. Including, but not limited to the billing performance of an IT group. So if you imagine the directive is always to get the best results for the lowest cost.

One other way that you might oversee and ensure that the IT management performance is good, that is the managers are performing well, is to see if they're trending downwards on their infrastructure cost per user, adjusted for value that's being delivered. So realizing that we have two here, running through it real quickly again. We're just making sure that the IT group is overall producing value, and then two, individual workers and managers are producing value, or on the whole, they're trending in the correct direction on their metrics. So three is mitigation of the risks associated with the information and technology.

So, we will actually, primarily be focusing on number three here, simply because AWS is security first and the risk here, that third word in number three, the risks associated with using information and technology are typically technology risks. In addition to availability risks and otherwise, business continuity risks. So there's actually another course in Cloud Academy that I am the lecturer for called Advanced High Availability. If you're interested in the other kind of risks besides just the risk associated with security and the risks that you can mitigate up front, then you should go look at that high availability course. So again, we're looking at making sure that we're trending upwards and downwards on the correct metrics, and we're going to focus on number three mainly, simply because that's a very wide domain and there's a lot of AWS value added tools that help us do these risk managements.

So let's talk about some of the modern challenges. AWS and the cloud present some new challenges. Capitalize the cloud because it is truly a brave new world in terms of the old way that these IT governance principles were applied to things, don't necessarily apply in Amazon, but a lot of them do.

So number one is that we have dynamic billing. So if you imagine what old IT infrastructure groups looked like, an operations team, that was capital expenditure. You would go and you would buy a bunch of equipment. You would buy a building. You would buy a building. You would buy cooling facilities, and you would put stuff in it and let it sit. So realizing that, unlike in the old world, where really our only ongoing cost was that of the human capital to run a data center, and the electric and ongoing utility bills. We have very dynamic costs that should scale, linearly, with our customers, or our customer count.

So number one is, rather than this static purchase, where we might make a big buy once every three years in the old IT world, we actually have to, in an ongoing manner, be governing the dynamic billing associated with an Amazon account. So there's a lot of focus around Amazon governance and Amazon account control, around this dynamic billing feature here. And because we have these operationalized expenditures that vary each month, and the cost is ongoing to service to services model here, services consumption rather than capital expenditure. We need to be aware that dynamic billing is one of those new, unique challenges.

We have fully remote access. So again, thinking back to the old infrastructure IT model, in addition to capital expenditure versus operational expenditure, we also had physical access to these servers. So if I needed to go debug a problem, I might even be able to connect the monitor and go actually look at these data centers, data center computers or racks, depending on the size of the system and the sophistication of the company, I might have a physical server cage that I need to manage rather than simply logical abstractions and logical isolations like you do with API-driven remote access through Amazon. So there's a whole slew of differences between having a physical box that you can walk up and touch or hug the server so to speak, versus having this fully-remote access that's logical only access.

So number three is dynamic resources. Beyond just being remote and dynamically billed, we also have dynamic size and footprint of these resources. So the billing is typically a focus on, we don't want to overprovision. And we want to have an inventory of our resources for the purpose of not spending more. Realizing that if you have unaccounted-for resources in this dynamic environment, where we can dynamically provision new servers and such, unaccounted-for resources are actually a security risk and are a business risk, right. For instance, if you have personally identifiable information on a server that you lose track of because you don't have a mature enough governance process to inventory all of your instances, that's a problem. So these are the three main, large, new challenges that are presented with the cloud. Anything else that you can think of that's four through N on this list is typically a combination of a number of these things together. And it's really just this dynamism that we have in the billing and the resources as well as the lack of the ability to go hug our server and the move to fully logical only, isolation and access.

So when we think about what are the steps that we need to take to start governing out these risks or to ensure that we're trending in the correct direction on the business value that ITs providing. Well we need to adequately manage and control, we need a couple different things. We need a policy, so how things should be. These are also stated as best practices typically. The policy definition is more formal and it generally is required by a business for you to follow. So in the open source environment, you might think of a policy as something that you should do. Policies in a corporate sense and an Amazon account governance sense are a little bit more formal or at least a little bit more assertive around how they should be followed. So we have how things should be here. These can be expressed as best practices, but best practices that you are required to follow or you'll get fired.

So these'll be things like you have to encrypt data on EBS volumes. Just basic, you must do this because it is a best practice that our company requires to mitigate out risk. It is the best practice that our company is required to perform on the basis of regulation, or any other reason why things have to be done a certain way because reasons XYZ.

So we also need to have compliance. If you imagine if we set up a policy and we have no ability to measure if people are actually following or complying with that policy, then the policy has no teeth. So imagine that we set up a system in which we tell everybody that our policy on Amazon is that you have to have all of our EBS volumes encrypted, and we have no way or operationalized method to check if all the EBS volumes are encrypted, then we don't have this bullet number two. We don't have this ability to measure compliance. So we don't have an adequate governance system because we can't, even though we have this policy in place, we don't have any ability to enforce it because we have no ability to even know if it's being followed.

And of course three, I just eluded to enforcement. Pretend that now we are looking at the same scenario with the EBS volume encryption. If I have a policy that states that everything needs to be encrypted on EBS, I have a method whereby I can check repeatedly, consistently, and quickly. If all EBS volumes are encrypted and I can check that, then the third step of course would be any time that there is a deviation from that policy, say somebody creates an unencrypted EBS volume. I need to be able to keep or return things to how they should be.

So there's two ways this goes. In the case of the EBS volume, we could either set up some sort of system or process that actively prevents people from creating unencrypted EBS volumes, and/or detect when unencrypted EBS volumes are created and go back and encrypt them or go back and fix that problem.

So again, we have this three step process, which is define how things should be, set up a system whereby you can actually check if things are as they should be in a scalable, repeatable, and automated manner, and then three, make a system that allows you to either enforce on an ongoing basis before it happens, preventing something from happening or making something happen, and/or, after the fact, making sure that these policies are followed.

So on Amazon, we have three main ways to govern. One is resources states. So a resources state would be like a billable Amazon resource. These would be things like a Kinesis stream. An EC2 instance, an EBS volume. These are things that we are imagining as logical object things that we can manipulate in the cloud. The definition isn't super clean, but it's pretty much anything that you're going to be paying for on an hourly basis will be a resource for the purposes of this slide.

Transition actions that we have to govern. So in addition to the steady state or the present state of each individual hourly thing that you're renting. We also have these transition actions. And these transition actions are AWS API calls, and if you can imagine they tie into number one there because in order to get a change in a resources state, you go through an AWS API call. Sometimes those resource states will change when we have breakage, but when we're thinking about doing intentional change to resource states. Intentional change to resource states by the customer always happens through AWS API calls. So we have this gate-keeping premise here on number two, and we have the steady state premise on number one.

Number three, of course, who makes AWS API calls? Well, it's the actors. So we look at these actor states as another thing that we need to govern. And there's really just the AWS IAM users, roles and groups. Or infrastructure that is running under a role, users or groups. So these are making sure that we have adequately set up IAM systems. So this is the third part of governance. And we can see that this creates a triple. Actors perform transition actions on resources, which change states.

All right, so it's...all three of these are tied together. They're three different parts of the same triplet, and we need to be able to govern all three to adequately blanket our account in a way that prevents our business from being at undue risk, making sure that we don't have excess resources that we're paying too much money for. Just a number of different things we just want to make sure that our account is following best practices and our policies that we set up as part of our governance plan.

So in summary, this course teaches how to ensure AWS accounts meet business requirements. Again, very generic. We call this governance in this case, but we're going to learn a number of different techniques to govern those three different types of states, and if we go through the rest of these slides, we're going to learn different ways that we can do all of those things. So first we're actually going to start with the different technologies that AWS gives us to review resource states and perform those enforcement actions that we were talking about a couple slides back.

About the Author

Nothing gets me more excited than the AWS Cloud platform! Teaching cloud skills has become a passion of mine. I have been a software and AWS cloud consultant for several years. I hold all 5 possible AWS Certifications: Developer Associate, SysOps Administrator Associate, Solutions Architect Associate, Solutions Architect Professional, and DevOps Engineer Professional. I live in Austin, Texas, USA, and work as development lead at my consulting firm, Tuple Labs.