EFS in Practice
This course dives into the AWS Elastic File Service - commonly known as EFS - and explains the service, its components, when it should be used, and how to configure it. EFS is considered file-level storage, supporting access by multiple EC2 instances at once, and is also optimized for low latency access. It appears to users like a file manager interface and uses standard file system semantics, such as locking files, renaming files, updating them, and using a hierarchical structure. This is just like what we're used to on standard premises-based systems.
The course kicks off with a high-level overview of EFS including its features, benefits, and use cases. This is followed by a review of the different storage class options it provides, namely Standard and Infrequent Access. A demonstration then provides a walk-through on how to configure an EFS file system within your VPC. The course covers how to secure your elastic file system, touching upon access control, permissions, and encryption as methods for protecting your data effectively. Finally, it moves on to importing existing on-premises data into EFS. If you want to cement your knowledge of this topic with hands-on experience, you can then try out the Introduction to Elastic File System lab.
- Understand the AWS Elastic File System along with its benefits and use cases
- Understand which performance and storage class to configure based upon your workloads
- Configure and create an elastic file system
- Mount EFS to your existing Linux instances
- Understand some security features and requirements of EFS
- Import existing data into your elastic file system
This course has been created for:
- Storage engineers responsible for maintaining, managing and administering file-level storage
- Security engineers who secure and safeguard data within AWS
- IT professionals preparing for either the AWS Cloud Practitioner exam or one of the three Associate-level certifications
- Those who are starting their AWS journey and want to understand the various services that exist and their use cases
To get the most from this course, you should be familiar with the basic concepts of AWS as well as with some of its core components, such as EC2 connectivity and configuration, in addition to VPC. You should also have an understanding of IAM permissions and how access is granted to resources.
Hello and welcome to this lecture, where I will be discussing the different storage class options that EFS provides in addition to how you can alter and configure certain performance factors depending on your use case of EFS. Now, Amazon EFS offers two different storage classes to you, which each offer different levels of performance and cost, these being Standard and Infrequent Access, known as IA. The Standard storage class is the default storage used when using EFS. However, Infrequent Access is generally used if you're storing data on EFS that is rarely accessed. By selecting this class, it offers a cost reduction on your storage.
The result of the cheaper storage means that there is an increased first-byte latency impact when both reading and writing data in this class when compared to that of Standard. The costs are also managed slightly differently. When using IA, you are charged for the amount of storage space used, which is cheaper than that compared to Standard. However, with IA, you are also charged for each read and write you make to the storage class. This helps to ensure that you only use this storage class for data that is not accessed very frequently, for example, data that might be required for auditing purposes or historical analysis.
With the Standard storage class, you are only charged on the amount of storage space used per month. Both storage classes are available in all regions where EFS is supported. And importantly, they both provide the same level of availability and durability. If you are familiar with S3, then you may also be familiar with S3 lifecycle policies for data management. Within EFS, a similar feature exists known as EFS lifecycle management. When enabled, EFS will automatically move data between the Standard storage class and the IA storage class. This process occurs when a file has not been read or written to for a set period of days, which is configurable, and your options for this period range include 14, 30, 60, or 90 days.
Depending on your selection, EFS will move the data to the IA storage class to save on cost once that period has been met. However, as soon as that same file is accessed again, the timer is reset, and it is moved back to the Standard storage class. Again, if it has not been accessed for a further period, it will then be moved back to IA. Every time a file is accessed, its lifecycle management timer is reset. The only exceptions to data not being moved to the IA storage class is for any files that are below 128K in size and any metadata of your files, which will all remain in the Standard storage class.
If your EFS file system was created after February 13th, 2019, then the life cycle management feature can be switched on or off. Let me now take a look at the different performance modes that EFS offers. AWS are where the EFS can be used for a number of different use cases and workloads, and as such, each use case might require a change of performance from a throughput, IOPS, and latency point of view. As a result, AWS has introduced two different performance modes that can be defined during the creation of your EFS file system. These being General Purpose, and Max I/O.
Now, General Purpose is a default performance mode and is typically used for most use cases. For example, home directories and general file-sharing environments. It offers an all-round performance and low latency file operation, and there is a limitation of this mode allowing only up to 7,000 file system operations per second to your EFS file system. If, however, you have a huge scale architecture, where your EFS file system is likely to be used by many thousands of EC2 instances concurrently, and will exceed 7,000 operations per second, then you'll need to consider Max I/O. Now, this mode offers virtually unlimited amounts of throughput and IOPS. The downside is, however, that your file operation latency will take a negative hit over that of General Purpose.
The best way to determine which performance option that you need is to run tests alongside your application. If your application sits comfortably within the limit of 7,000 operations per second, then General Purpose will be best suited, with the added plus point of lower latency. However, if your testing confirms 7,000 operations per second may be reached or exceeded, then select Max I/O.
When using the General Purpose mode of operations, EFS provides a CloudWatch metric percent I/O limit, which will allow you to view operations per second as a percentage of the top 7,000 limit. This allows you to make the decision to migrate and move to the Max I/O file system, should your operations be reaching that limit.
In addition to the two performance modes, EFS also provides two different throughput modes, and throughput is measured by the rate of mebibytes. The two modes offered are Bursting Throughput and Provisioned Throughput. Data throughput patterns on file systems generally go through periods of relatively low activity with occasional spikes in burst usage, and EFS provisions throughput capacity to help manage this random activity of high peaks.
With the Bursting Throughput mode, which is the default mode, the amount of throughput scales as your file system grows. So the more you store, the more throughput is available to you. The default throughput available is capable of bursting to 100 mebibytes per second, however, with the standard storage class, this can burst to 100 mebibytes per second per tebibyte of storage used within the file system.
So, for example, presume you have five tebibytes of storage within your EFS file system. Your burst capacity could reach 500 mebibytes per second. The duration of throughput bursting is reflected by the size of the file system itself. Through the use of credits, which are accumulated during periods of low activity, operating below the baseline rate of throughput set at 50 mebibytes per tebibyte of storage used, which determines how long EFS can burst for. Every file system can reach its baseline throughput 100% of the time. By accumulating, getting credits, your file system can then burst above your baseline limit. The number of credits will dictate how long this throughput can be maintained for, and the number of burst credits for your file system can be viewed by monitoring the CloudWatch metric of BurstCreditBalance.
If you are finding that you're running out of burst credits too often, then you might need to consider using the Provisioned Throughput mode. Provisioned Throughput allows you to burst above your allocated allowance, which is based upon your file system size. So if your file system was relatively small but the use case for your file system required a high throughput rate, then the default bursting throughput options may not be able to process your request quick enough. In this instance, you would need to use provisioned throughput. However, this option does incur additional charges, and you'll pay additional costs for any bursting above the default option of bursting throughput. That brings me to the end of this lecture, now I want to shift my focus on creating and connecting to an EFS file system from a Linux based instance.
About the Author
Stuart has been working within the IT industry for two decades covering a huge range of topic areas and technologies, from data centre and network infrastructure design, to cloud architecture and implementation.
To date, Stuart has created 50+ courses relating to Cloud, most within the AWS category with a heavy focus on security and compliance
He is AWS certified and accredited in addition to being a published author covering topics across the AWS landscape.
In January 2016 Stuart was awarded ‘Expert of the Year Award 2015’ from Experts Exchange for his knowledge share within cloud services to the community.
Stuart enjoys writing about cloud technologies and you will find many of his articles within our blog pages.