We will explore the best ways of migrating data to AWS using the AWS Schema Conversion Tool.
The AWS Schema Conversion Tool is a component of AWS Data Migration Service (DMS) which is still in preview mode at the time of writing this post (December 2015). We have discussed the benefits of migrating to AWS in a previous post.
With DMS, Amazon is wooing corporate customers with a low-cost means of moving their database workloads to the cloud. Amazon already has a managed database service called RDS (Relational Database Service). However RDS has its own limitations, and one of those limitations is related to data migration. Until now, there was no simple way of migrating schema and data from an on-premise or EC2-hosted database to RDS. Because RDS is a managed service it doesn’t support a number of features specific to the database engines. Previously, the only solution would require DBAs and developers writing their own migration scripts and testing them in a painful and an iterative manner until all the wrinkles have been ironed out.
Amazon addresses this issue with Data Migration Service. As Amazon’s public website for DMS states, the service will allow seamless and continuous data replication between a source and a target environment during a migration process. These environments may use either the same database engines or different ones as source and target. It is worth noting that the source can be an on-premise database instance, an EC2-hosted instance, or even another RDS instance. The target can be an EC2 instance or another RDS instance. AWS now supports a number of database engines, so that these combinations can support a wide variety of use cases. Migrating data to AWS is progressively easier with each passing week. Here’s a good post about the AWS Data Migration Tool that was written at re:Invent 2015.
Anyone who ever worked in a data migration project knows the process is never simple — never. A number of disciplines are involved here: solutions architects, infrastructure architects, engineers, DBAs, developers, and quality assurance testers. All these roles form components of project teams whose work can span from a few weeks to many months. Successful database migration is generally iterative in nature. A repeatable process may recreate the target database structure every time, and load it with fresh source data repeatedly for quality assurance purposes. This process continues until everything works perfectly and the approvals are given for the next step. In my experience, the sooner this iterative process is put into practice, the better the chance of successful migration.
But a seamless data migration can only happen when the source and destination databases use similar or, at least, compatible structures. It may be relatively easy to migrate an on-premise SQL Server database to an RDS instance running the same engine, but, not so easy if the target is, say, MySQL. That’s because both engines use different methods of defining database objects and their code syntax is also widely different. There is no easy way out here. When the source and target databases are different, DBAs and developers have to bite the bullet and script the schema objects and programming logic in the target environment’s own language. More often than not, this becomes the most time-consuming part of migrating data to AWS.
This is where the AWS Schema Conversion Tool can come in really handy. You are probably wondering, “can the AWS Schema Conversion Tool save time for the database engineer?” The answer is yes! The AWS Schema Conversion Tool can connect to a source database engine and reverse engineer a source database structure to a format suitable for the target database. This includes converting both database schema objects (tables, indexes, constraints etc.) and database logic (functions, stored procedures, triggers etc). Where it can’t convert a source item, it will flag it and recommend remediation steps. You can then apply the converted schema and logic to the target environment from this same tool and make any manual changes to the target as suggested.
Migrating data to AWS has been common for a while and database migration toolkits are nothing new either. Most commercial database vendors offer them for free, but there are few that offer multiple source and target options, and I can’t think of any others available for almost all operating systems. The AWS Schema Conversion Tool is a desktop product that runs on Windows, Mac OS, Fedora and Ubuntu Linux. It’s free and can be downloaded once you sign up for a preview of the Database Migration Service.
Being the first of its kind from AWS, this tool has some limitations in its first release. It’s used for converting databases to RDS instances only — more precisely MySQL or Aurora in RDS. The source database engines can be either MS SQL Server or Oracle. That means there’s no support for PostgreSQL or the more recent addition, MariaDB.
Installing the AWS Schema Conversion Tool
We tested the AWS Schema Conversion Tool by installing it on a Windows 10 machine as well as on a Mac (running Yosemite). The installation process is fairly simple and fast in both cases.
Once the tool is installed, you need to download and install the JDBC drivers for the supported database engines. This is because The AWS Schema Conversion Tool uses JDBC to connect to databases. In our case, we downloaded and installed two:
Once the installation is complete, you need to tell the AWS Conversion Tool where to find those drivers. This is done by selecting the Settings > Global Settings menu option and choosing Drivers from the dialog box. In the image below, we have specified the paths for both the drivers:
With the initial setup complete, you can start a new migration project (every migration has to be part of a project). In our case, we chose to convert the well-known AdventureWorks database running in an EC2 instance to a MySQL 5.6 RDS instance.
The first step is to define the project properties. When in this screen, you see the source can be either SQL Server or Oracle and the destination can be MySQL or Aurora.
Connecting to the Source Database
Once the project is created, it’s time to connect to the source instance. In the image below we are connecting to our SQL Server 2012 instance in EC2.
Although SQL Servers have instance names (default instances typically have the same name as the Windows box), we did not specify it here. The user account was created before. One thing to note here is, that the user account must be a SQL Server standard login and not a Windows domain or local account. If the SQL Server is running in Windows-only authentication mode, this will not work because no standard logins (even sa) will be available.
Once the source is connected, its databases and schemas under each database are loaded in the left side of the screen.
Now if you are a SQL Server DBA or developer, you will notice a couple of things immediately:
- A SQL Server instance has many components that are visible from the SQL Server Management Studio — the de-facto client tool for SQL Server. These objects include logins and server roles, jobs, alerts, linked servers, replication publishers/subscribers and so on. Even within each database, there are objects like, users, partitions, full text indexes etc. When you connect to the instance with the AWS Schema Conversion Tool, only databases and their objects, like tables, indexes, constraints or stored procedures are accessible.
If you are wondering why the tool works this way, remember, the Schema Conversion Tool is built for converting only database components, nothing else. That’s why it shows database objects from the source instance only.
The same thing can be said about Oracle databases. There are many different links for schemas, tables, users, directory objects, jobs or RMAN backups within the Oracle Enterprise Manager Database Control. Again, only database objects for different schemas are viewable from the conversion tool.
- In SQL Server Management Studio, similar types of components in a database are grouped together. For example, all tables from all schemas in a database are listed under the tables folder. The same types of folders exist for views, stored procedures, or functions. You may also note that even within each type of object, there can be further grouping for related objects. For example, the node for a particular table will have all its indexes listed under one folder and all the constraints listed under another folder, and so on. With the AWS Schema Conversion Tool, this classification is shown in a different way. Here, each schema under a database will have its own folder. Under each schema, there will be sub-folders for tables, views, stored procedures and so on. In essence, even though the objects could belong to the same database, their placement may be under different schemas.
The reason for this difference is in the naming convention. The term “database schema” is loosely used to mean two different things. One meaning refers to schema as the overall structural definition of the database. This definition includes the actual command to create the database as well as the commands to create objects within the database. The other meaning of schema refers to a namespace that belongs to a user account. In traditional database engineering terms, a schema is related to a database user — all the objects owned by a user belong to that user’s schema. A database then becomes just the holder for all the schemas in it.
SQL Server did not implement the concept of schemas in its older versions. However it’s very much a part of the database engine now and Microsoft has implemented the ANSI version of schema definition in its database product. On a high-level though, everything is still seen from an individual database perspective.
AWS Schema Conversion Tool sees things differently. It treats “schemas” in the traditional sense, a grouping of database objects from the same user. This means when you convert the source database components, the highest level you can convert from is the schema level, not the SQL Server database level. That means each separate schema under an SQL Server database will end up as a separate MySQL database in the target.
Connecting to the Destination Database
Next, we connected to our target environment, the MySQL RDS instance. Here we provided similar information:
Once connected, the target instance schema objects are loaded in the right pane of the tool.
Running the Database Migration Assessment Report
It’s now time to check what components from our source databases (AdventureWorks) can be converted to MySQL format. For this we selected a schema, right-clicked on it and selected “Create Report” from the pop-up menu.
This will kick in the schema conversion engine. AWS Schema Conversion Tool will take its time to reverse engineer every object within the schema and check if the generated code can be run against the target instance without any change. If it can, it will make a note of this finding. If it can’t, it will make a note about why it’s not possible. At the end, the assessment will generate a report. That report is called the Database Migration Assessment Report.
You can access the Database Migration Assessment Report from the View menu of the conversion tool. There are two parts of the assessment report: the Summary part and the Action Items part.
The Summary page gives an overall picture of the migration possibility. It shows how many objects in each type of component it can convert and how many it can’t.
The Action Items page will go deeper into the analysis. It will list every object that can’t be converted to the target engine and the reason for their failure. I find it hugely helpful that the AWS Schema Conversion Tool highlights the particular command or syntax in the generated code that caused the problem. This code is viewable from the lower half of the Action Items screen.
In the following images, we are seeing the summary and action items report for two schemas. You can see the bottom half of the Action Items screen shows the generated code for a particular object that can’t be converted. When you select an individual problem item from the top half of the screen, its related object is highlighted in the left pane and its generated code is shown in the bottom.
Converting the Source Schema
With the assessment report giving you the overall migration possibility, you have two choices:
- Make changes to the source database structure so it matches the target database syntax. This may mean changing data types for table columns or rewriting stored procedures, triggers, and functions with different syntax — in most migration scenarios, this is not an option.
- Apply the converted schema to the target instance as is, and then make manual changes to the target database by creating tables with correct data types, or write code with the correct syntax. The second option is the preferred method by most data engineers.
Migrating the schema in the target environment is a two-step process:
- Convert the schema. This is where the AWS Schema Conversion Tool will generate a local script to apply to the target environment.
- Apply the converted schema. The converted schema code is actually applied to the target database instance, and the target objects are created.
Applying the Converted Schema
With the schemas converted to MySQL format, the target instance will show the databases created. However, they don’t exist yet. To physically create the database objects, we had to apply the schemas from the target database pane:
And that completes the initial schema migration. If you are interested in migrating to virtual machines, you may find our post on Migrating Virtual Machines to the AWS Cloud useful.
We found the AWS Schema Conversion Tool a simple and intuitive piece of software with an elegant installation.
As noted before, there is no option to migrate to or from a PostgreSQL databases. Postgres is run in many corporate database workloads and we don’t know when this functionality will be available in the tool. The same can be said about other database engines like MySQL or MariaDB as sources. The target environment can only be one type and that’s RDS. This may help Amazon push its customers to adopt RDS, but it isn’t helpful if a customer wants to run their database in EC2.
Even if the schema migration runs smoothly and most of the objects can be transferred easily, database engineers still have to support other “moving parts.” This includes a multitude of components like: SQL Server Agent Jobs, Oracle Jobs, custom cron jobs, Windows scheduled tasks, network shares, users and privileges, linked servers, linked databases, partitions, full-text indexes, and so on.
The philosophy behind this tool’s purpose is focused. The tool is not meant to be a magic bullet for a turn-key migration – no tool can claim to have such functionality. Data mapping is the biggest time-sink for most migrations and this tool address that issue very well.
As we saw from our example, it’s not easy to migrate even a sample database in a lift-and-shift manner. That may disappoint many DBAs. After all, everyone wants a smooth migration where the tool does all the heavy lifting. We believe the AWS Schema Conversion Tool’s strengths lie in its ability to pinpoint exactly where the migration problems are, and suggesting ways to address those issues.
New on Cloud Academy: AWS Solution Architect Lab Challenge, Azure Hands-on Labs, Foundation Certificate in Cyber Security, and Much More
Now that Thanksgiving is over and the craziness of Black Friday has died down, it's now time for the busiest season of the year. Whether you're a last-minute shopper or you already have your shopping done, the holidays bring so much more excitement than any other time of year. Since our...
Understanding Enterprise Cloud Migration
What is enterprise cloud migration? Cloud migration is about moving your data, applications, and even infrastructure from your on-premises computers or infrastructure to a virtual pool of on-demand, shared resources that offer compute, storage, and network services at scale. Why d...
6 Reasons Why You Should Get an AWS Certification This Year
In the past decade, the rise of cloud computing has been undeniable. Businesses of all sizes are moving their infrastructure and applications to the cloud. This is partly because the cloud allows businesses and their employees to access important information from just about anywhere. ...
AWS Regions and Availability Zones: The Simplest Explanation You Will Ever Find Around
The basics of AWS Regions and Availability Zones We’re going to treat this article as a sort of AWS 101 — it’ll be a quick primer on AWS Regions and Availability Zones that will be useful for understanding the basics of how AWS infrastructure is organized. We’ll define each section,...
Application Load Balancer vs. Classic Load Balancer
What is an Elastic Load Balancer? This post covers basics of what an Elastic Load Balancer is, and two of its examples: Application Load Balancers and Classic Load Balancers. For additional information — including a comparison that explains Network Load Balancers — check out our post o...
Advantages and Disadvantages of Microservices Architecture
What are microservices? Let's start our discussion by setting a foundation of what microservices are. Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs). ...
Kubernetes Services: AWS vs. Azure vs. Google Cloud
Kubernetes is a popular open-source container orchestration platform that allows us to deploy and manage multi-container applications at scale. Businesses are rapidly adopting this revolutionary technology to modernize their applications. Cloud service providers — such as Amazon Web Ser...
AWS Internet of Things (IoT): The 3 Services You Need to Know
The Internet of Things (IoT) embeds technology into any physical thing to enable never-before-seen levels of connectivity. IoT is revolutionizing industries and creating many new market opportunities. Cloud services play an important role in enabling deployment of IoT solutions that min...
Which Certifications Should I Get?
As we mentioned in an earlier post, the old AWS slogan, “Cloud is the new normal” is indeed a reality today. Really, cloud has been the new normal for a while now and getting credentials has become an increasingly effective way to quickly showcase your abilities to recruiters and compan...
How to Go Serverless Like a Pro
So, no servers? Yeah, I checked and there are definitely no servers. Well...the cloud service providers do need servers to host and run the code, but we don’t have to worry about it. Which operating system to use, how and when to run the instances, the scalability, and all the arch...
AWS Security: Bastion Hosts, NAT instances and VPC Peering
Effective security requires close control over your data and resources. Bastion hosts, NAT instances, and VPC peering can help you secure your AWS infrastructure. Welcome to part four of my AWS Security overview. In part three, we looked at network security at the subnet level. This ti...
Top 13 Amazon Virtual Private Cloud (VPC) Best Practices
Amazon Virtual Private Cloud (VPC) brings a host of advantages to the table, including static private IP addresses, Elastic Network Interfaces, secure bastion host setup, DHCP options, Advanced Network Access Control, predictable internal IP ranges, VPN connectivity, movement of interna...