AWS Data Services
The course is part of this learning path
To be prepared for the AWS Certified Cloud Practitioner Exam, this course will enable you to demonstrate Amazon Simple Storage Service (S3), Amazon Glacier, Amazon Elastic Block Store (EBS) and Amazon CloudFront storage solutions, and help you identify when to apply AWS solutions to common business scenarios.
This course covers a range of different services, including:
- Amazon Simple Storage Service (S3)
- Amazon Elastic Block Storage (EBS)
- Amazon Glacier
- Amazon RDS
- Amazon DynamoDB, ElastiCache, and Redshift
- Amazon CloudFront
- AWS Import/Export Disk
- AWS Import/Export Snowball
- AWS Storage Gateway
By the end of this course, you should be able to:
- Describe the basic functions that each storage service performs within a cloud solution
- Recognize basic components and features of each storage service
- Identify which storage service would be most appropriate to a general use case
- Understand how each service utilizes the benefits of cloud computing, such as scalability or elasticity
This course is designed for:
- Anyone preparing for the AWS Certified Cloud Practitioner
- Managers, sales professionals, and other non-technical roles
Before taking this course, you should have a general understanding of basic cloud computing concepts.
If you have thoughts or suggestions for this course, please contact Cloud Academy at email@example.com.
Hello, and welcome to this lecture, where I want to explain when you may want to use AWS Storage Gateway, along with the different options available when using this service. Storage Gateway allows you to provide a gateway between your own data center's storage systems, such as your SAN, NAS, or DAS, and Amazon S3 in Glacier on AWS.
The storage gateway itself is a software appliance that can be installed within your own data center, which allows integration between your on-premise storage and that of AWS. This connectivity can allow you to scale your storage requirements both securely and cost efficiently. The software appliance can be downloaded from AWS as a virtual machine, which can then be installed on your VMware or Microsoft hypervisors.
Storage gateway offers different configurations and options allowing you to use the service to fit your needs. It offers file, volume and tape gateway configurations which can be used to help with your DR and data backup solutions. Let me explain the differences between each of these configurations, starting with file gateways.
File gateways allow you to securely store your files as objects within S3. Using it as a type of file share which allows you to mount or map drives to an S3 Bucket as if the share was held locally on your own corporate network. When storing files using the file gateway, they are sent to S3 over HTTPS, and are also encrypted with S3's own server-side encryption, SSE-S3. In addition to this, a local on-premise cache is also provisioned for accessing your most recently accessed files to optimize latency, which also helps to reduce egress traffic costs. When your file gateway is first configured, you must associate it with your S3 Bucket which the gateway will then present as an NFS v3 or v41 file system to your internal applications. This allows you to view the Bucket as a normal NFS file system, making it easier to mount as a drive in Linux or map a drive to it in Microsoft. Any files that are then written to these NFS file systems are stored in S3 as individual objects as a one to one mapping of files to objects.
The second option we have as a gateway configuration are volume gateways. And these can be figured in one of two different ways, stored volume gateways and cached volume gateways. Let me explain stored volume gateways first. Stored volume gateways are often used as a way to backup your local storage volumes to Amazon S3 whilst ensuring your entire data library also remains locally on premise for very low latency data access. Volumes created and configured within the storage gateway are backed by Amazon S3, and are mounted as iSCSI devices that your applications can then communicate with. During the volume creation, these are mapped to your on premise storage devices, which can either hold existent data or be a new disk. As data is written to these iSCSI devices, the data is actually written to your local storage services such as your own NAS, SAN, or DAS storage solution. However, the storage gateway then asynchronously copies this data to Amazon S3 as EBS snapshots. Having your entire data set remain locally ensures you have the lowest latency possible to access your data, which may be required for specific applications, or security compliance and governance controls whilst at the same time, providing a backup solution which is governed by the same controls and security that S3 offers. Volumes created can be between one gig and 16 terabytes. And for each storage gateway, up to 32 stored volumes can be created, which can give you a maximum total of 512 terabytes of storage per gateway. Storage volume gateways also provide a buffer which uses your existing storage disks. This buffer is used as a staging point for data that is waiting to be written to S3. During the outline process, the data is sent over an encrypted SSL channel and stored in an encrypted format within S3. To access the management and backup of your data, storage gateway makes it easy to take snapshots of your storage volumes at any point, which are then stored as EBS snapshots in S3. It's worth pointing out that these snapshots are incremental ensuring that only the data that's changed since the last backup is copied, helping to minimize storage costs in S3. As you can see, gateway stored volumes makes recovering from a disaster very simple. For example, let's consider the scenario that you lost your local application and storage layers on premise. Providing you had prevision for such a situation, you may have AMI templates that mimic your application tier which you could prevision as EC2 instances within AWS. You could then attach EBS volumes to these instances which could be created from your storage gateway volume snapshots, which would be stored on S3, giving you access to your production data required. Your application storage infrastructure could be potentially up and running again in a matter of minutes within a VPC with connectivity from your on-premise data center.
Let me now compare this option to the second volume gateway option of cached volume gateways. Cached volume gateways are different to stored volume gateways, in that the primary data storage is actually Amazon S3 rather than your own local storage solution. However, cached volume gateways do utilize your local data storage as a buffer and a cache for recently accessed data to help maintain low latency, hence the name cached volumes. Again, during the creation of these volumes, they are presented as iSCSI volumes which can be mounted by your application servers. The volumes themselves are backed by the Amazon S3 infrastructure as opposed to your local disks as seen in the stored volume gateway deployment. As a part of this volume creation, you must also select some local disks on-premise to act as your local cache and a buffer for data waiting to be uploaded to S3. Again, this buffer is used as a staging point for data that is waiting to be written to S3 and during the upload process the data is encrypted using an SSL channel, where the data is then encrypted within SSE-S3. The limitations is slightly different with cached volume gateways, in that each volume created can be up to 32 terabytes in size. With support for up to 32 volumes, meaning a total storage capacity of 1024 terabytes per cache volume gateway. Although all of your primary data used by your applications is stored in S3 across volumes, it is still possible to take incremental backups with these volumes as EBS snapshots. In a DR scenario, and as I mentioned in the previous section, this then enables quick deployment of the data sets which can be attached to EC2 instances as EBS volumes containing all of your data as required.
The final option with AWS storage gateway, is a tape gateway, known as gateway VTL, virtual tape library. This allows you to again backup your data to S3 from your own corporate data center, but also leverage Amazon Glacier for data archiving. Virtual tape library is essentially a cloud based tape backup solution, replacing physical components with virtual ones. This functionality allows you to use your existing tape backup application infrastructure within AWS, providing a more robust and secure backup and archiving solution. The solution itself is comprised of the following elements. Storage gateway. The gateway itself is configured as a tape gateway, which has a capacity to hold 1500 virtual tapes. Virtual tapes. These are a virtual equivalent to a physical backup tape cartridge which can be anything from 100 gig to 2.5 terabytes in size, and any data stored on the virtual tapes are backed by AWS S3 and appear in the virtual tape library. Virtual tape library, VTL. As you may have guessed, these are a virtual equivalent to a tape library that contain virtual tapes. Tape drives. Every VTL comes with 10 virtual tape drives, which are presented to your backup applications as iSCSI devices. Media changer. This is a virtual device that manages tapes to and from the tape drive to your VTL, and again is presented as an iSCSI device to your backup applications. Archive. This is equivalent to an off-site tape backup storage facility where you can archive tapes from your virtual tape library to Amazon Glacier, which as we already know is used as a cold storage solution. If retrieval of the tapes are required, storage gateway uses the standard retrieval option which can take between three to five hours to retrieve your data. Once your storage gateway has been configured as a tape gateway, your applications and backup software can mount the tape drives along with the media changer as iSCSI devices to make the connection. You can then create the required virtual tapes as you need them for backup and your backup software can use these to backup the required data which is stored on S3. For a list of the supported third party backup applications, please visit the link on screen. When you want to archive virtual tapes for maybe cost optimization or compliance and governance or even DR, then the data is simply moved from Amazon S3 to Amazon Glacier. A
AWS storage gateway costs are defined by three different cost points. Storage, requests and data transfer. Storage pricing, like many other services we have already discussed, depends on the region. However, with AWS storage gateway, it also depends on the type of gateway used. This table shows the London region and the different costs for each of the storage gateway options. With requests pricing you are charged one cent per gigabyte for data written to your storage gateway. With a maximum cost of 125 dollars per gateway per month. You are also charged one cent per gigabyte of virtual tape retrieval. Again, these prices are based on the London region. Inbound data transfer is free, however there are costs for data transfer out to another gateway in a different region and for the amount of data transferred back to your on-premises gateway.
- AWS Storage Services Overview
- AWS Storage Services
- Amazon Simple Storage Service
- Amazon Glacier
- EC2 Instance Storage
- EBS Storage
- Amazon Elastic File System
- Amazon Cloudfront
- AWS Storage Gateway
- AWS Snowball
About the Author
Head of Content
Andrew is an AWS certified professional who is passionate about helping others learn how to use and gain benefit from AWS technologies. Andrew has worked for AWS and for AWS technology partners Ooyala and Adobe. His favorite Amazon leadership principle is "Customer Obsession" as everything AWS starts with the customer. Passions around work are cycling and surfing, and having a laugh about the lessons learnt trying to launch two daughters and a few start ups.