Scenario - creating a highly available campaign site for loungebeer.com
The course is part of these learning paths
In this group of live videos, we tackle a practical scenario to help you learn real-world cloud consulting skills.
This is a unique and engaging live video format where we join the Cloud Academy AWS, Azure, and Google Cloud Platform teams in a real-time work situation. The team listen to a customer brief, discuss and define technical requirements and then evaluate which of the public cloud platforms could best deliver on the customer requirements.
From this course, you will learn how cloud professionals go about solving real-world business problems with cloud solutions.
With this course, you will learn how cloud professionals tackle and solve a business problem with each of the three public cloud platforms. This course is highly recommended for anyone interested in learning how to become a cloud architect, specialist or consultant!
Learning how to use your cloud skills in real-world situations is an important skill for a cloud professional. Real life projects require you to be able to evaluate requirements, define priorities and use your knowledge of cloud services to come up with recommendations and designs that can best meet customers' requirements. As a cloud professional you often have to think on your feet, process information quickly and be able to demonstrate design ideas quickly and efficiently.
In this course, we work through a customer scenario that will help you learn how to approach and solve a business problems with a cloud solution. The scenario requires us to build a highly available campaign site for an online competition run by loungebeer.com - a "craft" beer launching a new product in to the market at the US Superbowl event.
In these interactive discussions we join the team as they evaluate the business requirements, define the project constraints, and agree the scope and deliverables for the solution. We then work through the technical requirements we will use to evaluate how each of the three cloud platforms - Google Cloud Platform, AWS and Microsoft Azure - could be used to meet the technical requirements.
We follow each of the platform teams as they define solution architectures for Google Cloud Platform, AWS and Microsoft Azure. We then regroup to run a feature and price comparison before the team builds a proof of concept for our solution design.
This group of lectures will prepare you for thinking and reacting quickly, prioritzing requirements, discussing design ideas and coming up with cloud design solutions.
02/2018 - DynamoDB now supports encryption at rest so that would potentially influence our choice of database in thie scenario
For planning tools see
For more information on White Listing see
- Hey, Jeremy, Jeremy, we're trying to sketch out this design for this campaign that we've been given for this craft beer. It's a craft beer, they're collecting names at the Super Bowl. So we've got a lot of burst activity, there's gonna be a lot of people entering at the same time. It's basically being run in the halftime break at Super Bowl, so we've only got like half an hour. Now one of the key requirements they had was that they need to have encrypted storage, because these are personal records. So here's how our straw man. We were thinking, probably Route 53 to manage our domain name, some sort of static code, SQS, and also at the moment thinking, are we gonna have to go RDS Postgres, to get ourselves some encrypted records, 'cause Dynamo DB probably doesn't support that, right? Any ideas on this?
- There's some potential for doing some things better.
- [Andrew] Yeah sounds good.
- I'd probably approach it by breaking it down into two processes, two distinct processes. The first one is obviously the collection of the competition registration, so that's all the front end, all the stuff we have here, and then we have the backend processes, so the collection of the data, putting it into persistent storage, and doing some reporting and monitoring. So what I'd probably do is you know, let's target the front layers first.
- Okay, alright.
- So I guess one of the things we could do is use a static website hosted out of CloudFront. So where's?
- Drop the CloudFront here. Yeah.
- Exactly, and covers a broad user base. We're not tied down to iOS, we're not tied down to Android, and because we're time-bound to get this thing up and running quickly for the impending Super Bowl, it's an easy approach.
- Wow, like it. And the idea, 'cause this is the kind of granularity we need to get to now. The Cognito user pool, talk me through that. How's that gonna help us with authentications?
- Good idea, great. That takes out a lot of the complexity we were struggling with with how to manage some of our security, 'cause obviously we're in the public domain, we've got the potential to be hacked, we've got the potential to have a denial of service attack, so that straight away gives us an authenticated connection back to our processing layer. So with Firehose, walk us through how that's gonna work.
- So the great thing about Firehose is Kinesis through Firehose takes away a lot of the complexity in terms of having to configure and set it up. So you can quickly get up and running by sending the data that is sent to Firehose and then on-send it down to S3, Redshift Elasticsearch.
- Elasticsearch, yeah. So that's a really cool service. Already, that's simplifying things quite significantly here. Now I'm just thinking in my mind, we got stuck on this SQS idea because Simple Q service is such a great way of decoupling layers, isn't it? It gives us that kind of durability. Now, I'm thinking straight away wow, Kinesis is gonna give us the same thing. Firehouse is actually designed for that particular use case, isn't it? So straight away, it's a good way of us collecting this kind of burst data, it's highly available, it's multi-region. All of the availability's taken care of for us, as it was with SQS. So if we had to make a decision between those two, and straight away I'm already so excited about this because Kinesis gives us some logic to clean up our records before we push them into encrypted storage. And we were trying to think through how we were gonna do all that, straight away we've got a way.
- Yes, exactly. And furthermore though, Firehose can be configured to batch the data, compress it, and encrypt it. And what we can do is we do all of that very easily through a bit of configuration, rather than actual code, and we on-send it to S3. And the great thing is once we've got it in S3, we can use a service like AWS Athena to do the querying, and the cool thing about Athena is that it's designed to handle encryption and compression, so we don't need to do the decompression ourselves. Athena will take care of that for us. So it's a quick way for us to query the data, get results.
- That is sounding really good. Alright, this is getting way simpler. Okay, now in terms of these two, if we have to, just to make a choice I think, a final design decision on SQS versus Firehose, all that functionality sounds great, but that sounds expensive. Is that gonna cost more than using SQS?
- That's interesting, Andrew, because depending on the way we use Kinesis, it could actually work out to be cheaper. So the way SQS is priced versus the way Kinesis is priced, you have to consider the volume, the frequency, those kinds of dynamics of your application, and if we set up Kinesis a certain way, certain shards et cetera, it could work out to be cheaper.
- Woo hoo, alright. 'Cause the two constraints we've got with this campaign are time and costs, because the time is Super Bowl, we can't change the date of that unfortunately, and the cost budget we've been given is fairly fixed, so let's think Firehose as our--
- And on the cost front, as soon as we've done our querying on S3 bucket, we can obviously use lifecycle policies to push that data off to Glacier.
- Again, we're a cost-optimizing solution, the longevity of it. Whereas if we were to put that data into say, a relation database, or Dynamo DB, you're going to have an ongoing cost for that solution.
- Yeah, good point. 'Cause where we were leading, where I was thinking was we can't encrypt records easily in Dynamo DB, so I thought okay, gonna have to use either KMS or some sort of encryption routine to manage our data layer, it seemed like Postgres might have been an easy way to get encryption at rest, but Postgres was gonna come with quite a cost, and it seemed like a massive overkill to use a relational database like Postgres to store what is quite simple records. So I love that, the fact that we can store things in S3, it's way cheaper, way more, I suppose easier for us to manage with the lifecycle policies. Let's just peel back the onion a bit on Athena. So how's that gonna work?
- So with Athena, as I briefly mentioned before, you can use it to basically query text data that's stored in files, and it will deal with the compression on the files and encryption on the files. So it's a great solution to rapidly to work with to meet our time-critical launch date.
- Great. Now while we're still in that capturing swim lane, how about security, cleaning up any records, looking for any rogue activity? 'Cause we've gotta make sure this thing keeps running, there's no way this can have any down time.
- Yeah, so in front of the presentation layer, we can actually deploy AWS Shield, and this will act as a DDoS layer, so they'll shield us from being overwhelmed by a huge amount of incoming requests. Additionally, we can deploy WAF into the air to look at layer seven security, hacks that may be coming in, so we can do filtering on the inbound HTTP requests.
- Okay, so the baseline is WAF, we can just stand up a WAF instance inside our VPC. Or we use AWS Shield, which is a managed service which comes as part of, as long as we're using CloudFront distribution, those two are connected, right? Do we have to use CloudFront if we use AWS Shield?
- Yeah, we can use both. Shield works with CloudFront.
- And that's gonna give us a lot more, it's a managed service, and one of the other key requirements we wanted was to ensure that we had as many managed services as possible, to take out complexity and cost. Alright, so that's gonna deal with a lot of our possible security concerns. In terms of this presentation layer, just before we move onto what we're gonna do with the records once we've got them, what are you thinking? How are we gonna make all this work?
- So the presentation layer can be static because we're not processing the data when it comes back to us, in the web layer. So I'm thinking an SPA, single page architecture type style presentation layer. We would build that and host it in a S3 bucket, and have the S3 bucket as an origin behind our CloudFront distribution. That's going to work really well in terms of giving us good throughput, good performance, and cost is gonna be cheap as chips as well. We're not paying
- Alright, woo hoo.
- an instance hourly charge on that.
- John's gonna love this.
- And it's gonna scale out. We're talking the Super Bowl here, so potentially a hundred million users. So this thing needs to scale.
- Yeah, there's gonna be a lot of traffic. The campaign's gonna run in a global audience so we need to deal with potentially huge amounts of traffic, good and bad, so that sounds like a really solid way of doing that. So we're going S3, Athena, Kinesis Firehose, we're using Cognito user pools for our access token, gives us that authenticated connection between our front end and our collector, which is Kinesis. Do we need any kind of, NoSQL storage here? Are we just gonna use S3 for everything?
- We could potentially, instead of using Firehose we could go with Kinesis Streams, and that would allow us to have multiple subscribers, so we could have one subscriber that still goes to S3, with our batched and encrypted text files, and then we could have a secondary subscriber that takes the same data and puts it into NoSQL, Dynamo DB, or it could be RDS, which could give us a bit more flexibility at query time in terms of being able to do relational queries.
- Yeah, yeah, 'cause we've sort of drifted into this swim lane now, haven't we? Which is good. Alright, so we've got the information, we've done some cleaning up with this batch process which sounds fantastic Jeremy, 'cause we've taken out a lot of the need to write what could be quite complex code, to handle a whole lot of problems there. Now, the brief was, they're not so interested in real-time metrics of how the campaign's running, but my gut feeling is that they would like to know that, and for our benefit, we absolutely need to have as much monitoring around here as possible, to ensure that we can say, A we've guaranteed the uptime that we had as a target, and that if there are any issues we're able to deal with them dynamically. So let's think about that. What can we do there? What are our inputs and outputs? What tools could we use?
- So on the monitoring front, we could potentially, as the data is coming through this layer, potentially feed some of it off to perhaps an Elasticsearch cluster. That would allow our ops team or the SSOPs guys to be able to run queries against the runtime data that's coming through the system. It could also allow us to trigger alarms to the team again, in terms of any security issues that are coming in.
- [Andrew] CloudWatch?
- CloudWatch is an option.
- So we could probably do simple CloudWatch metrics, couldn't we, to spot what look like potential issues.
- Once we've got the data in S3, we could subscribe maybe some lambda functions to look for known vulnerabilities, or duplication's another thing that we possibly need to consider, you know, multiple entries coming in from the same person, lambda subscribing from the backend into S3 could look for those patterns in the data set.
- Do we need that as part of our minimal viable product? Or is that something we have as an add-on?
- It's probably something we could add on early on in the system, if not after the launch date. We could do some data cleansing on the data set.
- Yeah, I like your call out on the multiple entries, 'cause what's to stop people from doing that? And you know, we haven't really trapped that, and we've got, perhaps is that part of batch processing in Kinesis? Is that something it can handle? Or do we need that lambda function to do that for us?
- It's possible that we could bring that up into this layer. The great thing about getting the data into Kinesis is that we've got time to act on it, so by default, Kinesis has a seven-day retention span, which can be reconfigured up to, no sorry, it's 24 hours.
- 24 hours, isn't it? Seven days for SQS, yeah that's right.
- It's 24 hours for Kinesis, which can be configured up to seven days, and it's seven days for SQS default, up to 14 days. So the great thing about that is we've got time to act on the data that's sitting in there. And yes, in that sense we could lift the requirement that is done by lambda down here up into this layer to do the cleansing.
- 'Cause our team could handle that type of logic. The less code we have to write the better, right? The less we have to manage. Sounds good. Alright--
- So we're still looking for a solution that we can put together as quickly as possible to meet the launch date, so we're looking for parts of the system that we can literally turn on and enable without having to get our hands dirty in terms of actually coding the solution.
- Yeah, and 'cause our time cost features triangle we really have to keep these two tight. And that's why I'm thinking minimal viable product, what do we go back with to say, this is what you're gonna have. If you wanna have anything else, it's something that comes at an additional cost, 'cause we can't deliver it within those two constraints that you've given us. So if we've put enough monitoring in there to ensure that we know if there's any issue at all, we can capture it and act on it. We've got as many autonomous rules as we can put in there based on our lambda logic, looking for the obvious ones. Do we need to have config in here? We've got a few API calls going on. Do we need to monitor anything with CloudTrail? We've got CloudTrail looking for us, what do you reckon? Do we need that level of detail?
- Yeah well CloudTrail would allow us to ensure that the system as turned on remains in the same configuration. That there isn't accidental configuration drift applied by our rogue developers that are on the team.
- I love that term configuration drift. That is a good one, isn't it? Because it's so easy to end up adding or watching your environment expand, and if we need to be able to report on any activity that's considered to impact on this, so I think it might be worthwhile putting CloudTrail as part of our minimal viable product. Well, it does come at a bit of a cost, right? There's a little bit of a cost involved with that, not a major one.
- Not a major one, but you have to balance that with if we don't have it, and the system gets reconfigured into an unsecure, or insecure state, that's a bigger problem. That's gonna cost more in the long run to ourselves and to the client that we're putting this together for.
- Yeah, cool. Alright, I love the fact that we've got some authentication management here. So we put CloudTrail in, we've CloudWatch metrics which will just use our common metrics for monitoring our security in our dev ops environment.
- On the security front, we should probably just walk through the whole system in terms of how a request comes in. I think now what we should do is sort of clarify the solution and get some more rigor in the actual architecture. So with that in mind, we'll roll over to--
- [Andrew] Aha, that looks better! Hey we've got rid of a few--
- [Jeremy] Two components.
- We got rid of a few things, that's great. Alright, walk us through Jeremy, what do we do?
- And you've got IAM here 'cause we're gonna need to role that, is that right?
- And you've got all this as a managed service as well, AWS Shield, we just turn that on. Perfect, that's gonna be much easier for security. We've got a few rules to configure, obviously. So what happens now?
- Okay, so Kinesis Firehose acts as the collector of all this information. Again, it's a managed service, and it will scale out for us. So we don't have to configure shards or turn the dials to get the necessary throughput, it will do that for us. We will have configured the Firehose to basically batch the data, to use KMS to do the encryption on the data before we send it to S3.
- So we use the Kinesis service.
- So again, that's just configuration, so we don't need to codify this. So taking this configuration approach, we should be able to build this architecture in time for our launch.
- Yep, great.
- So the data's coming in to Kinesis, it's being batched, it's being encrypted, it's being compressed, and then it's being sent to S3. Kinesis is also then configured with IAM to allow it to write to S3. So there's a policy there for Kinesis to basically forward that data down into an S3 bucket. So we get the data collected in S3, the great thing now is that we can use a service like Athena to query the compressed, encrypted, batched files. We again don't need to codify anything to actually do the decompression, the unencryption, that's all handled for us. In terms of cost optimization, we can put in lifecycle policies, so that over time the data graduates into Glacier so that we keep our running costs, our future running costs of the system minimal. We can also bring AWS QuickSight into the equation to give us some analytics, by running our Athena queries, and that will give us some quick wins in terms of getting dashboards, some charting et cetera on the data that we've collected.
- So we've thrown away SQS, as being a possible cost factor. We've also thrown away our Elasticsearch logs. We were gonna use logs for some of our security features. We've got CloudTrail enabled, which is gonna give us enough security monitoring. We're gonna enable config to just ensure that if there's any change to the environment that will be captured and we can act on it as need be, but we're not going to add our dynamic log monitoring as part of our minimal viable product. I love the fact you've got lots of managed services in here, so all we've got is a lot of configuration, which is fantastic. We don't have to worry too much about writing much.
- In essence, the whole solution is serverless. We're not having to maintain servers, we don't have to worry about any instance charges, it is effectively a serverless solution.
- So our brief will be to configure our account into IAM roles, we've gotta set up KMS, enable CloudTrail, enable config, create our CloudFront distribution, set up AWS Shield, Route 53, we've gotta get the domain name set up, et cetera. It's a perfect scenario. And we're giving some additional reporting, which is probably one of the nice to have features on our time cost features list, we put reporting as a should have, but the reality is we need to have some sort of visibility to ensure that if there are any issues with this service, 'cause we are trying to make it as durable and as highly durable as possible, we can trap and have some eyeballs on those. And I love the use of QuickSight, that's adding a whole lot of value. Right, okay.
- So one other thing we haven't really touched upon is this is our architecture that we've settled on, how would we go about actually setting it all up? And I think the service to choose here is CloudFormation. So we treat our infrastructure as code, and we codify it into CloudFormation templates, and the great thing about that is once we've got that encoded in our CloudFormation template, we can repeat the setup for different environments. So we're covered for dev, for UOT, for test, and production. It's a good way for actually documenting the full setup, so if either you or I leave the team, we're not taking the IP away, it's there in the CloudFormation templates.
- Perfect. So that can be almost a delivery, can't it? We can deliver the CloudFormation template to the client and say, here's the entire solution ready to go. Let's just think this through for any possible single points of failure, or just any more granularity on our security design here. So at our browser level we've got an HTTPS connection so how we gonna handle our SSL?
- So SSL will be managed by CloudFront, we'll enable that on the CloudFront distribution, and what we'll do is we'll utilize ACM, AWS Certificate Manager, and we will provision a SSL certificate with our custom domain. The great thing with ACM is that that will manage the lifetime of the certificate and refresh it, hands off, that's a great service for that. So that will basically give us the HTTPS encryption at transit for the data.
- And if we have a requirement from the customer to use their own certificate, which is highly unlikely, we could run that from KMS as well for the CloudFront SSL certificate if we needed to, which is highly unlikely. I can't imagine that would pop up.
- Yeah, and the Kinesis Firehose has by default an HTTPS input anyway. So that is already enabled for us.
- Right, and with Cognito, what we're gonna do there is the Cognito request is gonna use a role, an IAM role to create an STS token. That token will be used to authenticate the request from there on. So we've got a token in there. Anything else we need to think about? Single points of failure, I can't really see any here because we've got a series of managed services. I mean, is there anything we're missing here? Like, high availability guaranteed, high availability guaranteed.
- 11 nines availability, four nines, five nines.
- CloudFront gives us a lot more durability in our front end. I like the fact that we've limited the scope creep potential with the solution, because there was a lot of talk about how we could help with the campaign management and responses and things. But those are actually outside of the scope of this, so we had to be pretty hard with actually limiting the scope to actually collecting the entries, keeping them in an encrypted format, and then being able to monitor the environment to be as secure as possible, without adding a whole lot of complexity.
- Exactly. So we're covered for security, we're covered in terms of our time constraints, because we've focused the design on configuration as opposed to building a bespoke solution.
- Let's build it, great. Okay.
- Users, STM.
- [Andrew] Yeah, great.
- Hopefully it's gonna work.
- [Andrew] Yeah, let's go test it out.
- [Jeremy] Let's do it.
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.