End of life data center migration
The course is part of these learning paths
This course is a "live" scenario discussion where the Cloud Academy team tackle a migration project. Our customer needs to migrate out of their current data center by a certain date. They also would like to modernize their business applications.
Our brief in the exercise is to deliver:
- A target architecture that addresses the challenges described by the customer
- A migration plan detailing how to move the service to AWS with minimal interruption
- A recommendation on how to approach DR to achieve RPO of 24 hours and RTO of 4 hours
- An application optimization plan with a proposed enhancement roadmap
As a scenario, this series of lectures is recorded "live" and so is less structured than other Cloud Academy courses. As a cloud professional you often have to think and design quickly, so we have recorded some of the content this way to best emulate the type of conditions you might experience in the working environment. Watching the team approach this brief can help you define your own approaches and style to problem-solving.
This course discusses AWS services so it is best suited to students with some prior knowledge of AWS services.
We recommend completing the Fundamentals of AWS learning path before beginning this course.
If you have thoughts or suggestions for this course, please contact Cloud Academy at email@example.com.
22-01-2020: Duplicate lecture removed
- [Narrator] Hi, welcome back. Let's join the team as they work out a target architecture for expertiseplease.com.
- The business perspective, end of life data center, and most importantly exhaustion of storage. So, that's the key constraint for this project is that if that happens within seven months then the business fails.
- So we've got seven months.
- Yup, we got seven months to work backwards from. From the people perspective we need to ensure that there's less, there's minimal interruption. So any change to URLs 'cause, you know, it's used as a HTTP service, we gotta make sure there's minimal interruption to people using the service, and so if there's any URL remapping to do, or any changes to the interfaces, we gotta ensure that it creates as little impact as possible. We've got our platform perspective, and the current problem is that we've got a monolithic design and that's slowing the company down because trying to rebuild or add features to the, the design is taking too long. So we need to break that up.
- So this one's going against agility?
- Yeah, right, yeah, yeah. So that's anti agility right now. We've got our security perspective and because we're storing, you know, sensitive information we need to ensure that it's encrypted at rest and at transit. We've got a key management service at the moment used at the data center. We can export the key but we need to have a, you know, a good way of managing encryption, and we need to ensure that there's absolutely no compliance gaps, so--
- So, the business has some keys that they're already using. Do we need to migrate those keys?
- Yeah, we're gonna have to shift those out. So they're stored in the safe net lunar appliance within the data center, so, arguably we could use cloud HSM as an interim solution, but that's gonna come at a cost. So I think in the long term scheme of things we'll probably have to find a better way of managing keys, 'cause the cost would be prohibitive of using a cloud HSM long term.
- So the initial focus is to address the seven month get out of the data center.
- And then the next stage will be to cost optimize so--
- Nice, yeah. If we look at our triangle I think, yeah, that time constraint of seven months is the critical piece, providing some additional functionality, yeah, and reducing costs as much as possible. Using the cloud adoption framework just helps you, you know, recognize and capture some of those things that, that pop up. So with security, for example, I think we need to do just better reporting, because these guys have, you know, compliance requirements. They have, you know, they've got a compliant infrastructure because they're managing sensitive information. They're gonna need to have very, very, you know, granular reporting and insurance on all traffic in and out of their VPC.
- So the cloud adoption framework from which these perspectives are coming from, this is a best practice framework born out of the professional services from AWS.
- It's a really good model, isn't it Jeremy? It just always helps us remember little things that we might not be considering when we're thinking about a project. Now the other perspective is operations. So, on that pillar, you know, we probably wanna try and use as many managed services as possible. So, the more managed services we have, the more reliability and durability we have. So let's try and incorporate as many AWS managed services as we can into this design. All right, so if our key constraint is to get off this data center before the end of life or exhaustion of storage, and to, you know, migrate, if we looked at this as stages, why don't we think about a three stage approach? So, for stage one we could go for a simple lift and shift. So, we take as much of the environment we have currently working and just basically migrate it to AWS, to make it as, you know, the minimal impact as possible, to get us all the functionality we need, but to get rid of this constraint on exhaustion of data storage. We can shift to S three storage, obviously. We can talk about the actual components of that in a minute. Stage two could be a transformation of sorts. So that we look to improve perhaps the connectivity, improve connectivity, increase or reduce the disaster recovery issues. So, we can provide a better disaster recovery design by re engineering slightly. We can improve connectivity by adding direct connect, or certainly a VPN back to the data center, and providing them a bit more security around the connections.
- And we'll start to, from the operations perspective use more managed services?
- Yeah, yeah, yeah.
- So, in stage two, we're reducing the number of service servers that were, yeah, running in stage one, which means less on going maintenance, more efficiency.
- Yeah. Cool, and stage three is perhaps where we can start to look at, you know, how we can transform this design to be more, more micro serviced approach, I suppose is one like example, or way to describe it, certainly. Decouple layers. We can introduce more services. We can move from a monolithic design to more micro services, yeah. And that's going to give the company a lot more agility, right?
- So then we started to use things like lambda, lambda functions.
- I understand they're using JBus for their application site.
- We can potentially stage three start to port some of the, the JBus application design into Java lambda functions.
- Yup, with step functions.
- With step functions to orchestrate the work flow. And straight away we're going to get some good ones in terms of no longer having to order and feed the JBus application servers.
- Which I understand in the current on prem solution are causing some pain.
- Yeah. Okay, so let's look at the components here. We could go, we lift and shift. We just export the Oracle database to RDS, and we can use the data migration service for that. So no potential transformation with that. We can shift the image store from data center to S three, and we can do that using Snowball. We've got 160 terabytes of data, I believe, just roughly. So we could do two Snowball appliance exports.
- And because this is a legal business, the transportation of that data is done under encryption.
- Yeah, so it's encrypted when you load it, right?
- We could use S three sync to just synchronize that between our export and any down time we're likely to have while we're transforming this. We just lift and shift applications or service. We don't have VM Ware working here, so we can't use the VM Ware connector, unfortunately.
- So there's no export?
- No, we're gonna have to just basically take the applications and export applications, all right? There's no easy way around that. If we were using VM Ware we could use the VM Ware connector--
- Which is still not as bad as it sounds. I mean, we can easily stand up equivalent instances with the right operating system, and so long as we have the ability to reinstall the application--
- We should be good to go.
- Do we need to bake any AMIs or are we just gonna use fairly standard AMIs for this?
- We can, we can launch the instances, install the application, and then, yeah, take an AMI.
- And that will help with us getting a better RPO and RTO.
- Yeah, so we can approve the RPO an RTO in this first port because we can create a, I'm thinking single available, sorry, single region.
- Single region, multi AZ, and that would be our production system, and then we can clone that, perhaps using Cloud Former.
- So we, we build the environment manually by hand, again, just to expedite the delivery of stage one.
- And at the end of that we'll use Cloud Former to basically take a snapshot of the whole environment, and then we can replay Cloud Former in a different region. Give us a pilot light environment.
- Yeah, okay. All right. That's looking good. Now, what sort of, what do you recommend to cut over time? We're doing a lift and shift. We export two Snowball appliances to get out, the majority of our data migrated out of the data center into S three. That's gonna take sort of two to four days. We've got seek running for as long as we need to to, you know, keep our two environments in synchronization. We're exporting our application. We're importing it into, into EC two. We've got a little bit of setting up to do there. Cloud Former is gonna be basically our tool to stand up our environment. What are you thinking? Is this like a week? Is this two days, or, what sort of time frame should we be, just thinking through that minimal interruption, you know, consideration, for their adoption?
- Well, my gut feel here is that this is a week.
- A week?
- A week, yeah.
- Yeah, okay.
- What we can do also is you know because this is a, the users access the application using a web app we can use route 53 to manage the redirection.
- Perfect. So we introduce route 53 to handle that DNS, which means we can use weighted routing.
- Now, we've got a few connections to think about too Jeremy. Well I've gotta think back to, we need to have some connectivity back to the data center during this, for our S three sync as well as just ensuring that we've got like some you know roll back or some connectivity for some of the application services. So, I would, I'd leap at direct connect for that, but that's got a delay, right? You've got sort of a five to--
- Yeah, you've got a provisioning of, I don't know, it's like two weeks.
- Yeah. That's not gonna work for us.
- It's not insignificant.
- Yeah, we probably wanna put this as part of stage two, and not part of our lift and shift, all right.
- So stage one we'll, we'll stand up the gateway, IPC terminals back to the on prem network.
- And there will, 'cause I still need the path for these three sector.
- Yeah, cool, and we can run VPN over direct connect as well if we need to later.
- And the great thing about that is we can do that much, much more randomly than the direct connect solution.
- Yeah, that's sort of immediate. Okay, all right, so we've shifted our, we've shifted our images, our database transformation. The database migration service could help us with that. It would be fantastic to look at transforming the database at this point, because that could be a massive cost saving counter. We're just assuming that we don't have any inherent logic, PLSQL type logic in our database that's gonna cause us any problems. But if it does, we can of course roll back, if need be, because we haven't changed anything there. All right, so that's gonna probably match our business requirement that we're getting out of the data center before the end of life. It's gonna touch our people requirement because using route 53 we've got you know a guaranteed way of ensuring that there's no URL mapping issues or any possible outage.
- From our security perspective, all of the services that we're using are encrypted by design. So we've got probably a better level of security immediately. We've improved our RPO and RTO I think by, as you said, by having a single region multi availability zone design with a pilot light cloud formation based environment. It means that we're gonna be able to reduce the RPO and RTO values. But the ones we were talking about was eight hours for our RTO, sorry, and four hours for our RPO. All quite achievable, I think, with this.
- And the availability of the, the first region, and the pilot light environment is far more than the existing incumbent system.
- So, next thing for us is we need to design our VPC. So we're gonna have to come up with a VPC design. Straight away, we can go back to the customer and give them this lift and shift straw man. We need to think about our connectivity with the VPC. Should we just stick to the design phases first, so we've got kind of like a reference architecture to work backwards from, and we may be able to move a few of these things around. So, for stage two, we wanna improve connectivity. Well, direct connect is gonna do that. Route 53 is gonna give us some better connectivity management. We'd reduce the number of services. What else can we do here before we start ripping apart the applications that we can, you know, what other services could we add in here?
- So I think we can start to look at, you know, at the moment in stage one, we're using Oracle ideas.
- And maybe we could look at Aurora, Postgres?
- Yeah, that would be a massive cost saving. You know, we could really drive a cost reduction by shifting, using the data migration service again. We could do a schema export. We could do a comparison report to see how much logic we have in that database. Straight away, we could find that it's quite an easy transformation to shift it, and that's encrypted. So, we're using Oracle encryption at the moment. So we do have to have it on encrypted database for compliance requirements.
- And the great thing about Aurora is that it is highly, highly available. It's built on top of, you know, Amazon's massively architected and available infrastructure.
- Yeah. All right, so we should do a bit of a cost comparison. I think that, you know, the key thing here is that if we do just run the data migration service report, that's a piece of cake, and it will tell us straight away what the impact of migrating it would be. We could then factor in how much work would be required to do that. We may find it's really simple, that it could be part of this, could be part of the stage one?
- You know? I'm thinking though, risk wise, if we're thinking about minimal interruption, changing the database I think is probably the most dangerous thing we could try in this first stage, you know? We could end up with just some small outage or issue that does impact the people, platform, business, and operations perspectives. Yeah.
- Some other things we can do in the stage two is start to bring in some of the cloud smarts, like elastic load balancing.
- So we could use smaller instance types, but have more elasticity in the solution, so that, you know, we've got auto scanning pulls.
- And we scale our solution based on the demand that's coming in. So, you start to get better cost optimization on the solution. So, with stage one, because we'd have a lift and shift, you know, we're not optimizing for, and the solution is not being used at night, for example.
- Yeah, yeah, yeah, yeah.
- Whereas with this solution in stage two, we, we map our solution against the demand profile.
- Yeah. Cool. I put in auto recovery as well, 'cause that's another thing we get with our instance types. Yeah, so we'll call this instance stage two, cost optimization. 'Cause I think that's, that's what he's saying here is that we, we've added some services, so--
- Another thing we could do is we could factor out, you know, some of the static material of the web interface, and save it off Cloud Front, and then that straight away starts to take some of the demand off the, the application servers.
- All right.
- And they become focused on running the application code base, not the presentational app.
- Perfect, and we can add in ware for AWS shield here to better improve security through this Cloud Front, yeah.
- So that, that addresses the security perspective.
- And pre security.
- Nice one. This could save thousands, you know, we could save thousands with using spot or on demand instances, or ROIs even, because we've got a pattern, and the fact if we can, you know, shift to a more operationally efficient database, you know, we really could drive that cost optimization.
- There's also a requirement to do auto marking of these legal documents.
- Yeah, yeah, yeah.
- So, we could extract that function out of the application servers and launch that into perhaps a lambda function, or look at the elastic transcoder service.
- To do the auto marking for us.
- Another service that we can add in here. All right, I think we've got a pretty good stage two there. All right, so let's, let's deal with the elephant in the room, which is this monolithic app design. So, it all made sense when they did it right. One single app. So, we don't have a lot of development resource available. So we're gonna have to provide this. So, what, what sort of, I mean, obviously in terms of the business transformation, so, you know, one of our, the key brief is to get off data center to ensure that they don't run out of storage, to reduce costs, but ultimately to help the business you know make itself more competitive, by adding more functionality. So, the business feels like it's stagnant, stagnated a wee bit, and it's just taking too long for them to create new functions, and add value to the application, because it's a monolithic app. Now, if we go back and say, right, we're gonna split the app into a series of, you know, independent functions that can be scaled or built, you know, independently, what sort of development idea are we gonna, and, you know, potentially introduce with that? I mean, are we gonna make it too hard?
- No, I don't think so. What would, we'd bring in micro services to basically decouple the application and start to break it down into a more agile system that the development teams can work with. So, each team that has a particular ownership of a function within the application can basically work at their own codecs and continue to release as and when they desire to. And with this type of approach, you start to increase the agility of the overall system. So you can get more frequent releases. You can start to break open the app and actually expose APIs, so perhaps third parties can actually integrate with the system.
- All right, so we'll probably introduce API gateway.
- Yeah, that's great.
- Yeah, we do, so we're thinking, we're gonna basically redesign the app, aren't we?
- Yeah, I love that it syncs. If we can go back and say they've got independent teams working on specific functions, yeah, it's gonna increase their agility quite dramatically.
- We could potentially bring in elastic container service.
- So that the units that are deployed are actually containers.
- That'll make more sense.
- And the elastic container service with, and our design here, set behind API gateway.
- So stage three is more of a, a re architecture of the application itself.
- Stage two is more of a re architecture of the infrastructure.
- Infrastructure, yeah.
- So that gives us some really good stages to work with.
- [Narrator] Okay, let's start defining our target architecture and our future state for the solution. We'll be using Amazon S three for storage data assets, as it's cheap and reliable. We'll also be running as many managed services as possible. Ideally, with a more efficient database layer, looking to use Postgres or perhaps Aurora. Aurora supports encryption, meaning we don't need the Oracle TDE, which is going to be a good cost saving. The registration and document management functions can be done using Dynamo DB with lambda functions. We can easily solve and improve the Edge cases but the elephant in the room is this core app. The core app is holding up progress as it's slow to redevelop. We need to re engineer the monolithic app for the business to achieve more agility. So we want scalable, stateless micro services. That way, parts can be scaled and improved, without holding up other parts of the application. Now we can't do that in the first stage, as we only have one in house developer available.
About the Author
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.