AWS Data Services
The course is part of these learning paths
One of the core building blocks of Infrastructure as a Service (IaaS) is that of storage, and AWS provides a wide range of storage services that allow you to architect the correct solution for your needs. Understanding what each of these services is and what they have been designed and developed for, gives you the knowledge to implement best practices ensuring your data is stored, transmitted and backed up in the most efficient and scalable way. This course will focus on each of the storage services provided by AWS and will explain what the service is, its key features and when and why you might use the service within your own environment.
The objectives of this course are to provide:
- An overview and introduction to the different AWS storage services
- An understanding of how to transfer data into and out of AWS
- The knowledge to confidently select the most appropriate storage service for your needs
This course is designed as an introduction to the AWS storage services and methods of storing data. As a result, this course is suitable for:
- Those who are starting out their AWS journey to understand the various services that exist and their use case
- Storage engineers responsible for maintaining and storing data within the enterprise
- Security engineers who secure and safeguard data within AWS
- Those who are looking to begin their certification journey with either the AWS Cloud Practitioner or one of the 3 Associate level certifications
This is an entry-level course to AWS storage services and so no prior knowledge of these services are required, however, a basic understanding of Cloud Computing and awareness of AWS would be beneficial but not essential.
If you have thoughts or suggestions for this course, please contact Cloud Academy at email@example.com.
Hello and welcome to this final lecture of the course. Within this lecture, I want to highlight a high level some the main points from each of the storage services that I have introduced starting with Amazon S3.
Within this lecture, I explained that Amazon S3 is a fully managed object-based storage service that is highly available, highly durable, very cost-effective and widely accessible. It has almost unlimited storage capabilities and the smallest file size that it supports is zero bytes and the largest file size is five terabytes. Data is uploaded within S3 to a specific region and duplicated across multiple availability zones automatically. Objects have a durability of 11 nines and availability of four nines. And object must be stored within buckets or folders within a bucket. S3 has three storage classes, standard, standard infrequent access, and reduced redundancy. Security features of S3 include bucket policies, access control lists, data encryption both server side encryption and client side encryption, and SSL is supported for data in transit to S3. Data management features include versioning and lifecycle rules. And S3 is often used for data backup, static content for websites and large datasets. But it can be used for a wide variety of solutions as you see fit. S3 offers integration with other services such as EBS for snapshots, of cloud trial to store logs, always and origin for a cloud font distribution. Pricing for S3 is primarily based on the amount of storage used plus request and data transfer costs.
Following on from Amazon S3, I then covered Amazon Glacier which works closely with S3 but provides a very different function. The key points taken from this lecture were the Amazon graces an extremely low-cost long-term durable storage solution which is often referred to as cold storage, ideally suited for long-term backup and archival requirements. It has 11 nine durability, making this just as durable as Amazon S3, but it's much cheaper than S3. It doesn't however provide instant access of data retrieval. The data structure is centerd around vaults and archives and a glacier vault simply acts as a container for Glacier archives. Within vaults, data is stored as an archive but you have unlimited archives within your Glacier vaults. The dashboard within the console only allows you to create vaults. And if you want to move data into or out of Glacier, you have to be using the Glacier web service API or one of the AWS SDKs. There are three different retrieval options, expedited standard and bulk and data is encrypted by default using the AES-256 encryption algorithm. Access control can be governed through IAM policies, vault access policies, and vault lock policies. And there is a flat pricing structure for data stored in Glacier, regardless of the amount of storage used. However there are still request data transfer and additional costs relating to the amount of data retrievals made. And glacier is designed to archive data for extended periods time in cold storage for a very small cost.
After Amazon Glacier, I then looked at the storage directly attached to ECS Instances themselves, known as Instant Store Volumes. In this lecture, I explain that instant store volumes physically reside on the same host that provides your EC2 instance. But they only provide ephemeral storage for your EC2 instances, offering no means of persistency. As a result, it's not recommended to store critical or valuable data on these instant store volumes because if your instance is either stop or terminated, all stored data will be lost. If your instance is rebooted however, your data would remain intact. The storage used is included in the price of the EC2 instance. So you don't have an additional spend on storage costs. The IOPS can far exceed other alternatives such as EBS and the instant store volumes are often used for data that is frequently changing that doesn't need to be retained such as being used as a cash or buffer. And one last point to bare in mind is that not all instance types support instant store volumes.
Following instant store volumes, I then looked at block storage that offers persistent storage. This is in the form of elastic Block store, EBS volumes. EBS also provides block level storage to your EC2 instances. But unlike instant store volumes EBS offers persistent and durable data storage. EBS volumes can be attached and detached from your instances and are primarily used for data that is rapidly changing. A single EBS volume can only ever be attached to a single EC2 instance. However multiple EBS volumes can be attached to a single instance. EBS snapshots are for an incremental point in time backup of the entire volume and are then stored on Amazon S3. It's also possible to create a new volume from an existing snapshot. All writes are replicated multiple times within a single availability zone. EBS volumes are only available in a single availability zone. There are four types of EBS volumes available, two which are SSD backed and two which are HDD backed. Depending on the volume type, will depend on its cost. You are charged for the storage provision due per month and build on a per second basis. And EBS snapshots stored on S3 will also incur S3 storage costs. EBS encrypts data both at rest and when in transit if required. And encrypted volumes will also produce encrypted snapshots.
Next up was the elastic file system, EFS. In this lecture, we learned that EFS provides a file level storage service which is a fully managed highly available and durable service that allows you to create shared file systems. It's highly scalable, capable of meeting demands by thousands of EC2 instances concurrently, and it has a seemingly limitless capacity similar to S3. There is no need to provision a set size of data storage, like you need to with EBS and this makes an ideal storage option for applications that scale across multiple instances allowing for parallel access of data. EFS is a regional service and it's been designed to maintain a high level of throughput and very low latency axis response. Mount targets allow to connect the EFS file system from your EC2 instances using a configured mount target IP address. But this is only compatible with NFS version four and version 4.1. EFS does not currently support the Windows operating system. You must ensure that your Linux EC2 instance has the NFS client installed for the mounting process and the NFS client version 4.1 is recommended for this procedure. EFS can be configured to run in two different performance mode of operations. General purpose, which is the default and Max I/O. Encryption at rest can be enabled with the use of KMS. But encryption and transit is not currently supported by the service. The file sync feature can be used to migrate data to EFS via an agent. And the pricing passing is charged at per gigabyte months and there is no charge the data transfer or requests.
Following EFS, I introduced Amazon CloudFront, which is a content delivery network service. In this lecture, I explained that Amazon CloudFront is a content delivery network service which provides a means of distributing your source data of your web traffic closer to the end user requesting the content via AWS edge locations as cached data. As a result, it doesn't provide durability of your data. AWS edge locations are sites deployed in highly populated areas across the globe to cache data and reduce latency for and end user access. CloudFront uses distributions to control which source data it needs to distribute and to where. And there are two delivery methods that exist to distribute data, via a web distribution or an RTMP distribution. A CloudFront distribution requires an origin containing your source data such as S3. And data can be distributed using the following edge location options, US Canada and Europe, US Canada Europe and Asia or all edge locations. CloudFront can interact with the web application firewall service for additional security and web application protection. Additional encryption security can also be added by specifying an SSL certificate that must be used within the distribution. And pricing is primarily based on data transfer costs and HTTP requests.
Next up was a first lecture covering how you can migrate data into and out of AWS storage services. And here I looked at the AWS storage Gateway. Storage Gateway allows you to provide a gateway between your own and data center storage systems such as your SAM, NAZ or DAZ, and Amazon S3 or Glacier on AWS. The storage gateway is a software appliance downloaded as a VM and installed within your own data center. The storage gateway offers file volume and take Gateway configurations. So to file gateways it'll allow you to securely store your files as objects within S3. And you can then mount on map drives to an S3 bucket as if the share was held locally on your own corporate network. A local on-premise is cache is also provisioned for accessing your most recently accessed files. Volume gateways. These are configured as a store front gateway or cached volume gateway. The stored volume gateways are used as a way to backup your local storage volumes to Amazon S3 whilst ensuring your entire data library also remains locally on premise for very low latency data access. And now also presented as iSCSI volumes. Cache volume gateways, here the primary data storage is actually Amazon S3 rather than your own local storage solution. And cache volume gateways utilize your local data storage as a buffer and a cache for recently accessed data. And these are also presented as iSCSI volumes. Lastly, virtual tape libraries. These allow you to backup data to S3 from your own corporate data center and leveraged Amazon Glacier for data archiving. The virtual tape library is essentially a cloud-based tape backup solution, and the pricing for the service is based upon storage usage requests and data transfer.
Our final lecture looking at storage was based on AWS Snowball. This lecture explained the following points. The service is used to securely transfer large amounts of data in and out of AWS using a physical appliance known as a snowball. A snowball appliance comes as either a 50 terabyte or 80 terabyte storage device and is fully dust, water and tamper-resistant. It's been designed to allow for high-speed data transfers by default or data transfer to the snowball appliance is automatically encrypted. It also features end to end tracking using an E-link shipping label. The AWS snowball is HIPAA compliant, allowing you to transfer protected health information. And it's the responsibility of AWS to ensure that data held in the snowball appliance is deleted and removed when finish with. Snowball appliances can be aggregated together and as a general rule, if your data retrieval will take longer than a week using your existing connection method, then you should consider using AWS Snowball. Pricing is based on normal Amazon S3 data charges plus additional costs for the data transfer job and shipping.
That now brings me to the end of this lecture and to the end of this course. You should now have a greater understanding of the range of storage services offered by AWS and the differences between them and when to use them depending on your use case. If you have any feedback on this course, positive or negative, please do contact us at firstname.lastname@example.org. Your feedback is greatly appreciated. Thank you for your time and good luck with your continued learning of cloud computing, thank you.
About the Author
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data centre and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 50+ courses relating to Cloud, most within the AWS category with a heavy focus on security and compliance
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.