AWS Storage Services
The course is part of these learning paths
With an on-premises data backup solution within your data center, it’s critical for your business to have a disaster recovery plan built into your business continuity plans. You need to have a plan in place should a disaster occur that affects your operation of the business. The same is true when you start to leverage the cloud for its storage capabilities for your backed up data.
This course explains how cloud storage fits in with DR and the different considerations when preparing to design a solution to backup your on-premise data to AWS. It will explain how Amazon S3, AWS Snowball, and AWS Storage Gateway can all be used to help with the transfer and storage of your backup data.
You should not assume that just because you are backing data up to the cloud it will solve your every need, there are many points of consideration when planning a DR backup solution to the cloud, such as AWS. However, it does also open opportunities to you that may not have been possible with a standard on-premises backup solution. It’s these points of interest that many enterprises are focusing on to gain a significant advantage when it comes to disaster recovery.
AWS offers a number of different services available to help you architect the best solution for your needs. To allow you to set up the correct solution that works for you, you must first understand how each of these services can be of benefit to you.
To help you implement effective solutions, you must first have answers to the following:
- What is your RTO (Recovery Time Objective)?
- What is your RPO (Recovery Point Objective)?
- How quickly do you need to retrieve your data?
- How much data do you need to import/export?
- What durability is required for your data?
- How sensitive is your data?
- What security mechanisms are required to protect your data?
- Do you have any compliance controls that you need to abide by?
When you have answers to these questions, you will be able to start working towards an effective backup solution to create a cost-efficient, highly reliable, durable and secure data backup storage solution.
- Gain an understanding of how your storage solution can affect your business continuity and DR plans
- Obtain the knowledge to know when to use specific AWS storage solutions to your advantage between Amazon S3, Amazon Glacier, AWS Snowball and AWS Storage Gateway
- Understand how each of these services can provide a DR solution to fit your specific needs
This course has been designed for:
- Engineers who need to manage and maintain AWS storage services
- Architects who are implementing effective data backup solutions from on-premise to AWS
- Business continuity management managers
- Anyone looking to prepare for the AWS Solutions Architect - Professional certification
As a prerequisite to this course you should have a basic understanding of the following:
- Business continuity
- Disaster recovery
- Data backup terms and methodologies
- Amazon S3
- Amazon EC2
- Elastic Block Store (EBS)
This course includes
If you have thoughts or suggestions for this course, please contact Cloud Academy at firstname.lastname@example.org.
Hello and welcome to this lecture. I want to summarize at a high level, the main points taken from each of the previous lectures almost like a cram session of key information.
I started off by looking at the part that Cloud storage plays within DR. In the event of a disaster I explained that the data you need might not be available to you which could be due to a variety of reasons.
For example, your backup data might be in the same location as your production data and so any major disasters such as a fire or flood would affect both sets of data. Your tapes could become faulty through overuse, and manual activity such as tape labeling and rotation could produce errors.
This traditional method can also be ineffective due to scalability restrictions, big Capex cost to implement a backup solution, and data availability could be impacted with off site data retrieval.
I also explained that Cloud storage can be considerably cheaper than provisioning your own storage solution on-premise. Using Cloud resources increases the speed in which you can bring your service back into operation, and scalability and capacity is significantly enhanced within the Cloud compared to the on-premise resources. Enhanced security features within AWS also ensure compliance and governance controls are met.
The main points of how Cloud based storage is effective when used for DR are cost efficient, scalable, available and durable, secure, reliable, zero maintenance of hardware required, off-site storage, replication and automation is easily configured, it's readily accessible, and it's easy to test DR plans using AWS infrastructure.
Next I looked to a number of different considerations when planning a DR storage solution. Starting by looking at the different options available on how you could get your data in and out of AWS. These options included direct connect, VPN, internet connection, AWS Snowball, AWS Snowmobile, and AWS Storage Gateway.
Following this I explained that you needed to understand how quickly you need to retrieve the data back and this is largely dependent on your RTO requirements. And you need to understand the criticality of the data to the business, and your infrastructure also plays a part here, such as methods of connectivity to AWS that were available to you.
Next, another consideration is to understand how much data you actually need to import or export into or out of AWS, and this can affect your chosen solution. You need to calculate your target transfer rate and find a solution that offers you the correct capacity in your time constraints based on the quantity of data being transferred.
Following this I covered durability of your data. As an example here I looked at the different classes of S3 which offered varied durability and availability settings such as Standard, Infrequent Access, and Amazon Glacier.
Lastly, I spoke about two very important considerations, that being security and compliance.
You may need to align with specific governance and compliance controls and you need to have an understanding of encryption methods for both in-transit and at-rest, which is essential for sensitive data. Access security must be understood to define who is permitted to access the data along with those who are not, and failure to understand Cloud storage security can devastate your organization and your customers by exposing sensitive data.
You may need to abide by specific governance, compliance, laws, regulations, and frameworks, and you can use AWS Artifact to access AWS reports of compliance.
Following these considerations I then focused on how you can use Amazon S3 as a data backup solution for your on-premise corporate data center.
I started this lecture by expanding upon the different storage classes available starting with Standard, which has 11 nines of durability maintained by data being replicated across multiple devices in multiple availability zones within a single region. It has four nines of availability and security features include encryption and access controls, and data management functionality such as lifecycle policies.
The Infrequent Access class is very similar. However, there are two main differences. Firstly, only three nines of availability as opposed to four, which is offered by Standard, and secondly the cost. Infrequent Access is cheaper than Standard, making this a very good choice for backup data.
Lastly, Amazon Glacier. This is the cheapest option of the three classes and it's used as cold storage for data archiving. It uses different security mechanisms such as vault policies and you can use S3 lifecycle rules or SDKs to move data from one of the other classes to Amazon Glacier. Amazon Glacier does not offer immediate access to data and the speed of data retrieval will depend on which method you choose, these being expedited, standard, or bulk.
I followed this by discussing cross region replication and multipart uploads. Cross region replication is used to help with DR by reducing latency of data retrievals and complying with governance and compliance controls.
Looking at performance, multipart uploads are recommended for any object under the size of 100 meg, and the benefits of this are speed and throughput, interruption recovery, and management of data.
Lastly, within that lecture I explained how these different security options provide different ways to protect your data. IAM policies, bucket policies, access control lists, lifecycle policies, multifactor authentication delete, and versioning.
In the next lecture I discussed how to use AWS Snowball for data transfer.
AWS Snowball is a service used to securely transfer large amounts of data in and out of AWS by the form of your on-premise data center to Amazon S3, or from S3 back to your data center using a physical appliance known as a snowball.
The appliance comes in either a 50 terabyte or 80 terabyte device. The snowball appliance is built for high speed using the following onboard connections, RJ45, SFP+ Copper, or SFP+ Optical.
All data copied to a snowball appliance is encrypted by default via KMS keys. An AWS snowball is HIPAA compliant. The appliance is owned by AWS and they remove all data when the transfer is complete, in compliance with NIST standards. Snowball appliances can be aggregated together to transfer petabytes of data.
Understanding when to use AWS Snowball would depend on your existing connection to AWS from your data center. For example, via direct connect, VPN, or your internet connection, the amount of data you need to import or export, and as a general rule, if your data will take longer than a week to transport you should consider AWS Snowball.
At a high level the process to retrieve data from Amazon S3 would be as follows. Create a job, receive delivery of the appliance, connect your appliance to your network, transfer the data required, and then return the appliance to AWS. The final topic that I discussed was using AWS Storage Gateway for data backup.
In this lecture I explain that Storage Gateway allows you to provide a gateway between your own data center storage systems and Amazon S3 and Glacier on AWS. The storage gateway itself is a software appliance that can be installed within your own data center, and the appliance can be downloaded as a virtual machine and is stored on one of your own hosts.
The different configuration options available for Storage Gateway are file gateways, volume gateways, and tape gateways. Looking at file gateways I explain that they allow you to securely store your files as objects within S3, which is presented as an NFS share in which clients can mount or map a drive to.
All data is sent over a HTTPS connection and all objects are automatically encrypted using SSE-S3. A local cache is provisioned in the creation of a file gateway, which uses on-premise storage to access the most recently accessed files to optimize latency. Volume gateways are different in that there are two different types of volume gateways, stored volume gateways and cached volume gateways.
Looking at stored volume gateways. These are often used as a way to backup your local storage volumes to Amazon S3. Your entire data library is also kept on-premise for minimal latency, and during its creation volumes are created and backed by Amazon S3 and are mapped directly to on-premise storage.
Stored volumes are presented as iSCSI devices allowing communication from your application servers. And as data is written to these volumes it is first stored using the on-premise mapped storage before Storage Gateway then copies the same data asynchronously to S3. Snapshots of the volume can be take, which are then stored as EBS snapshots on S3. Volume sizes can be between one gig and 16 terabytes and hold up to 32 volumes giving a total storage of 512 terabytes.
Data is stored in a buffer using the on-premise storage before being written to S3 using an SSL connection. And in a disaster the EBS snapshots could be used to create new EBS volumes which can then be attached to EC2 instances.
Cached volume gateways. The primary data storage is actually Amazon S3 rather than your own on-premise storage solution as is the case with stored volume gateways.
A cache is held locally using on-premise storage for buffering and to access recently accessed data, minimizing latency. Volumes are presented as iSCSI devices allowing connectivity from your application servers. And all data sent to S3 uses an SSL connection and this is encrypted using SSE-S3. Volumes can be 32 terabytes in size with a total of 32 volumes, giving a total storage of 1,024 terabytes.
Again, snapshots of these volumes can also be taken, which are stored on S3 as EBS Snapshots. And again, in a disaster the EBS snapshots can be used to create new EBS volumes which can be attached to EC2 instances.
Lastly, I looked to the tape gateway options where I explained that tape gateways are known as Virtual Tape Libraries and they allow you to backup data to S3 from your own corporate data center but also leverage Amazon Glacier for data archiving.
Virtual Tape Library is essentially a Cloud based tape backup solution. And this option contains the following elements, virtual tapes, virtual tape library, tape drives, media changer, and archives.
Applications and backup software can mount the tape drives along with the media changer as iSCSI devices to make the connection. You can then create virtual tapes as when you need them where the data is then stored on S3. When virtual tapes are archived the data is simply moved from Amazon S3 to Amazon Glacier.
That has now brought me to the end of this lecture and to the end of the course.
You should now have a greater understanding of how and when to use Amazon S3, AWS Snowball, and AWS Storage Gateway for DR and backup from your on-premise data center.
If you have any feedback on this course, positive or negative, please do leave a comment on the course landing page. We do look at these comments and your feedback is greatly appreciated.
Thank you for your time and good luck with your continued learning of Cloud computing.
About the Author
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data centre and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 50+ courses relating to Cloud, most within the AWS category with a heavy focus on security and compliance
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.