We will explore the best ways of migrating data to AWS using the AWS Schema Conversion Tool.
The AWS Schema Conversion Tool is a component of AWS Data Migration Service (DMS) which is still in preview mode at the time of writing this post (December 2015). We have discussed migrating data to AWS in the past and that post is still useful.
With DMS, Amazon is wooing corporate customers with a low-cost means of moving their database workloads to the cloud. Amazon already has a managed database service called RDS (Relational Database Service). However RDS has its own limitations, and one of those limitations is related to data migration. Until now, there was no simple way of migrating schema and data from an on-premise or EC2-hosted database to RDS. Because RDS is a managed service it doesn’t support a number of features specific to the database engines. Previously, the only solution would require DBAs and developers writing their own migration scripts and testing them in a painful and an iterative manner until all the wrinkles have been ironed out.
Amazon addresses this issue with Data Migration Service. As Amazon’s public website for DMS states, the service will allow seamless and continuous data replication between a source and a target environment during a migration process. These environments may use either the same database engines or different ones as source and target. It is worth noting that the source can be an on-premise database instance, an EC2-hosted instance, or even another RDS instance. The target can be an EC2 instance or another RDS instance. AWS now supports a number of database engines, so that these combinations can support a wide variety of use cases. Migrating data to AWS is progressively easier with each passing week. Here’s a good post about the AWS Data Migration Tool that was written at re:Invent 2015.
Anyone who ever worked in a data migration project knows the process is never simple — never. A number of disciplines are involved here: solutions architects, infrastructure architects, engineers, DBAs, developers, and quality assurance testers. All these roles form components of project teams whose work can span from a few weeks to many months. Successful database migration is generally iterative in nature. A repeatable process may recreate the target database structure every time, and load it with fresh source data repeatedly for quality assurance purposes. This process continues until everything works perfectly and the approvals are given for the next step. In my experience, the sooner this iterative process is put into practice, the better the chance of successful migration.
But a seamless data migration can only happen when the source and destination databases use similar or, at least, compatible structures. It may be relatively easy to migrate an on-premise SQL Server database to an RDS instance running the same engine, but, not so easy if the target is, say, MySQL. That’s because both engines use different methods of defining database objects and their code syntax is also widely different. There is no easy way out here. When the source and target databases are different, DBAs and developers have to bite the bullet and script the schema objects and programming logic in the target environment’s own language. More often than not, this becomes the most time-consuming part of migrating data to AWS.
This is where the AWS Schema Conversion Tool can come in really handy. You are probably wondering, “can the AWS Schema Conversion Tool save time for the database engineer?”
The answer is yes! The AWS Schema Conversion Tool can connect to a source database engine and reverse engineer a source database structure to a format suitable for the target database. This includes converting both database schema objects (tables, indexes, constraints etc.) and database logic (functions, stored procedures, triggers etc). Where it can’t convert a source item, it will flag it and recommend remediation steps. You can then apply the converted schema and logic to the target environment from this same tool and make any manual changes to the target as suggested.
Migrating data to AWS has been common for a while and database migration toolkits are nothing new either. Most commercial database vendors offer them for free, but there are few that offer multiple source and target options, and I can’t think of any others available for almost all operating systems. The AWS Schema Conversion Tool is a desktop product that runs on Windows, Mac OS, Fedora and Ubuntu Linux. It’s free and can be downloaded once you sign up for a preview of the Database Migration Service.
Being the first of its kind from AWS, this tool has some limitations in its first release. It’s used for converting databases to RDS instances only — more precisely MySQL or Aurora in RDS. The source database engines can be either MS SQL Server or Oracle. That means there’s no support for PostgreSQL or the more recent addition, MariaDB.
Installing the AWS Schema Conversion Tool
We tested the AWS Schema Conversion Tool by installing it on a Windows 10 machine as well as on a Mac (running Yosemite). The installation process is fairly simple and fast in both cases.
Once the tool is installed, you need to download and install the JDBC drivers for the supported database engines. This is because The AWS Schema Conversion Tool uses JDBC to connect to databases. In our case, we downloaded and installed two:
Once the installation is complete, you need to tell the AWS Conversion Tool where to find those drivers. This is done by selecting the Settings > Global Settings menu option and choosing Drivers from the dialog box. In the image below, we have specified the paths for both the drivers:
With the initial setup complete, you can start a new migration project (every migration has to be part of a project). In our case, we chose to convert the well-known AdventureWorks database running in an EC2 instance to a MySQL 5.6 RDS instance.
The first step is to define the project properties. When in this screen, you see the source can be either SQL Server or Oracle and the destination can be MySQL or Aurora.
Connecting to the Source Database
Once the project is created, it’s time to connect to the source instance. In the image below we are connecting to our SQL Server 2012 instance in EC2.
Although SQL Servers have instance names (default instances typically have the same name as the Windows box), we did not specify it here. The user account was created before. One thing to note here is, that the user account must be a SQL Server standard login and not a Windows domain or local account. If the SQL Server is running in Windows-only authentication mode, this will not work because no standard logins (even sa) will be available.
Once the source is connected, its databases and schemas under each database are loaded in the left side of the screen.
Now if you are a SQL Server DBA or developer, you will notice a couple of things immediately:
- A SQL Server instance has many components that are visible from the SQL Server Management Studio — the de-facto client tool for SQL Server. These objects include logins and server roles, jobs, alerts, linked servers, replication publishers/subscribers and so on. Even within each database, there are objects like, users, partitions, full text indexes etc. When you connect to the instance with the AWS Schema Conversion Tool, only databases and their objects, like tables, indexes, constraints or stored procedures are accessible.
If you are wondering why the tool works this way, remember, the Schema Conversion Tool is built for converting only database components, nothing else. That’s why it shows database objects from the source instance only.
The same thing can be said about Oracle databases. There are many different links for schemas, tables, users, directory objects, jobs or RMAN backups within the Oracle Enterprise Manager Database Control. Again, only database objects for different schemas are viewable from the conversion tool.
- In SQL Server Management Studio, similar types of components in a database are grouped together. For example, all tables from all schemas in a database are listed under the tables folder. The same types of folders exist for views, stored procedures, or functions. You may also note that even within each type of object, there can be further grouping for related objects. For example, the node for a particular table will have all its indexes listed under one folder and all the constraints listed under another folder, and so on. With the AWS Schema Conversion Tool, this classification is shown in a different way. Here, each schema under a database will have its own folder. Under each schema, there will be sub-folders for tables, views, stored procedures and so on. In essence, even though the objects could belong to the same database, their placement may be under different schemas.
The reason for this difference is in the naming convention. The term “database schema” is loosely used to mean two different things. One meaning refers to schema as the overall structural definition of the database. This definition includes the actual command to create the database as well as the commands to create objects within the database. The other meaning of schema refers to a namespace that belongs to a user account. In traditional database engineering terms, a schema is related to a database user — all the objects owned by a user belong to that user’s schema. A database then becomes just the holder for all the schemas in it.
SQL Server did not implement the concept of schemas in its older versions. However it’s very much a part of the database engine now and Microsoft has implemented the ANSI version of schema definition in its database product. On a high-level though, everything is still seen from an individual database perspective.
AWS Schema Conversion Tool sees things differently. It treats “schemas” in the traditional sense, a grouping of database objects from the same user. This means when you convert the source database components, the highest level you can convert from is the schema level, not the SQL Server database level. That means each separate schema under an SQL Server database will end up as a separate MySQL database in the target.
Connecting to the Destination Database
Running the Database Migration Assessment Report
It’s now time to check what components from our source databases (AdventureWorks) can be converted to MySQL format. For this we selected a schema, right-clicked on it and selected “Create Report” from the pop-up menu.
This will kick in the schema conversion engine. AWS Schema Conversion Tool will take its time to reverse engineer every object within the schema and check if the generated code can be run against the target instance without any change. If it can, it will make a note of this finding. If it can’t, it will make a note about why it’s not possible. At the end, the assessment will generate a report. That report is called the Database Migration Assessment Report.
You can access the Database Migration Assessment Report from the View menu of the conversion tool. There are two parts of the assessment report: the Summary part and the Action Items part.
The Summary page gives an overall picture of the migration possibility. It shows how many objects in each type of component it can convert and how many it can’t.
The Action Items page will go deeper into the analysis. It will list every object that can’t be converted to the target engine and the reason for their failure. I find it hugely helpful that the AWS Schema Conversion Tool highlights the particular command or syntax in the generated code that caused the problem. This code is viewable from the lower half of the Action Items screen.
In the following images, we are seeing the summary and action items report for two schemas. You can see the bottom half of the Action Items screen shows the generated code for a particular object that can’t be converted. When you select an individual problem item from the top half of the screen, its related object is highlighted in the left pane and its generated code is shown in the bottom.
Converting the Source Schema
With the assessment report giving you the overall migration possibility, you have two choices:
- Make changes to the source database structure so it matches the target database syntax. This may mean changing data types for table columns or rewriting stored procedures, triggers, and functions with different syntax — in most migration scenarios, this is not an option.
- Apply the converted schema to the target instance as is, and then make manual changes to the target database by creating tables with correct data types, or write code with the correct syntax. The second option is the preferred method by most data engineers.
Migrating the schema in the target environment is a two-step process:
- Convert the schema. This is where the AWS Schema Conversion Tool will generate a local script to apply to the target environment.
- Apply the converted schema. The converted schema code is actually applied to the target database instance, and the target objects are created.
Applying the Converted Schema
With the schemas converted to MySQL format, the target instance will show the databases created. However, they don’t exist yet. To physically create the database objects, we had to apply the schemas from the target database pane:
And that completes the initial schema migration. If you are interested in migrating to virtual machines, you may find this post useful is useful.
We found the AWS Schema Conversion Tool a simple and intuitive piece of software with an elegant installation.
As noted before, there is no option to migrate to or from a PostgreSQL databases. Postgres is run in many corporate database workloads and we don’t know when this functionality will be available in the tool. The same can be said about other database engines like MySQL or MariaDB as sources.
The target environment can only be one type and that’s RDS. This may help Amazon push its customers to adopt RDS, but it isn’t helpful if a customer wants to run their database in EC2.
Even if the schema migration runs smoothly and most of the objects can be transferred easily, database engineers still have to support other “moving parts.” This includes a multitude of components like: SQL Server Agent Jobs, Oracle Jobs, custom cron jobs, Windows scheduled tasks, network shares, users and privileges, linked servers, linked databases, partitions, full-text indexes, and so on.
The philosophy behind this tool’s purpose is focused. The tool is not meant to be a magic bullet for a turn-key migration – no tool can claim to have such functionality. Data mapping is the biggest time-sink for most migrations and this tool address that issue very well.
As we saw from our example, it’s not easy to migrate even a sample database in a lift-and-shift manner. That may disappoint many DBAs. After all, everyone wants a smooth migration where the tool does all the heavy lifting. We believe the AWS Schema Conversion Tool’s strengths lie in its ability to pinpoint exactly where the migration problems are, and suggesting ways to address those issues.
WaitCondition Controls the Pace of AWS CloudFormation Templates
AWS's WaitCondition can be used with CloudFormation templates to ensure required resources are running.As you may already be aware, AWS CloudFormation is used for infrastructure automation by allowing you to write JSON templates to automatically install, configure, and bootstrap your ...
The 9 AWS Certifications: Which is Right for You and Your Team?
As companies increasingly shift workloads to the public cloud, cloud computing has moved from a nice-to-have to a core competency in the enterprise. This shift requires a new set of skills to design, deploy, and manage applications in the cloud.As the market leader and most mature p...
Two New EC2 Instance Types Announced at AWS re:Invent 2018 – Monday Night Live
The announcements at re:Invent just keep on coming! Let’s look at what benefits these two new EC2 instance types offer and how these two new instances could be of benefit to you. If you're not too familiar with Amazon EC2, you might want to familiarize yourself by creating your first Am...
Google Cloud Certification: Preparation and Prerequisites
Google Cloud Platform (GCP) has evolved from being a niche player to a serious competitor to Amazon Web Services and Microsoft Azure. In 2018, research firm Gartner placed Google in the Leaders quadrant in its Magic Quadrant for Cloud Infrastructure as a Service for the first time. In t...
Understanding AWS VPC Egress Filtering Methods
In order to understand AWS VPC egress filtering methods, you first need to understand that security on AWS is governed by a shared responsibility model where both vendor and subscriber have various operational responsibilities. AWS assumes responsibility for the underlying infrastructur...
S3 FTP: Build a Reliable and Inexpensive FTP Server Using Amazon’s S3
Is it possible to create an S3 FTP file backup/transfer solution, minimizing associated file storage and capacity planning administration headache?FTP (File Transfer Protocol) is a fast and convenient way to transfer large files over the Internet. You might, at some point, have conf...
Microservices Architecture: Advantages and Drawbacks
Microservices are a way of breaking large software projects into loosely coupled modules, which communicate with each other through simple Application Programming Interfaces (APIs).Microservices have become increasingly popular over the past few years. The modular architectural style,...
What Are Best Practices for Tagging AWS Resources?
There are many use cases for tags, but what are the best practices for tagging AWS resources? In order for your organization to effectively manage resources (and your monthly AWS bill), you need to implement and adopt a thoughtful tagging strategy that makes sense for your business. The...
How to Optimize Amazon S3 Performance
Amazon S3 is the most common storage options for many organizations, being object storage it is used for a wide variety of data types, from the smallest objects to huge datasets. All in all, Amazon S3 is a great service to store a wide scope of data types in a highly available and resil...
How to Optimize Cloud Costs with Spot Instances: New on Cloud Academy
One of the main promises of cloud computing is access to nearly endless capacity. However, it doesn’t come cheap. With the introduction of Spot Instances for Amazon Web Services’ Elastic Compute Cloud (AWS EC2) in 2009, spot instances have been a way for major cloud providers to sell sp...
What are the Benefits of Machine Learning in the Cloud?
A Comparison of Machine Learning Services on AWS, Azure, and Google CloudArtificial intelligence and machine learning are steadily making their way into enterprise applications in areas such as customer support, fraud detection, and business intelligence. There is every reason to beli...
How to Use AWS CLI
The AWS Command Line Interface (CLI) is for managing your AWS services from a terminal session on your own client, allowing you to control and configure multiple AWS services.So you’ve been using AWS for awhile and finally feel comfortable clicking your way through all the services....