Web Distributions


Web Distributions

The course is part of these learning paths

Web Distributions

Amazon CloudFront is a content delivery web service which integrates with other Amazon Web Services products to give you an easy way to distribute content to end users with low latency, high data transfer speeds, and no minimum usage commitments.

During this course we will cover a range of topics from an introduction to what CloudFront is, to architectural considerations, to pricing and reports. We will then do a walkthrough of creating a Web Distribution, during which we will consider security and best practices. After the creation of the Web Distribution we will start monitoring CloudFront with CloudWatch to ensure that our setup is suitable for our needs and to gather valuable information about our distribution. At the very end of this course we will provide an overview of general best practices.

In order to keep up with this course you should be familiar with the core services that AWS provides and best practices for working with the platform. If you need to get up to speed on this, you can start with the AWS Fundamentals series. This is an intermediate level course, and it is recommended that you also have some basic knowledge about CloudFront prior to start, but we will present a general overview as we get started.

If you have thoughts or suggestions for this course, please contact Cloud Academy at support@cloudacademy.com.


During this lesson, we're going to jump in and actually configure a web distribution. Before we get started, I'll quickly review the current environment.

I'm running a t2.micro instance in the Singapore region, where I've installed Apache and WordPress. As the database for WordPress, I've created an RDS Instance of MySQL. I've created two security groups. One for the web server that allows inbound HTTP traffic on Port 80, and SSH on Port 22. I'm sure you'll notice that I have Port 22 open to the whole world, which is not a recommended security practice. But as this is for demonstration purposes, I've left it this way so as to not expose my private IP address. The other security group is for RDS in which we have opened up Port 3306 for MySQL. This one only allows traffic that is coming from their web server security group.

Next, I've deployed an Elastic Load Balancer. If we got to Route 53, you'll see that I have created an Apex entry and a www alias entry that points to the Elastic Load Balancer. Finally, if we go to S3 you'll see that I've created two buckets with the first called "calabs-wordpresscode", which is where I'm synching the WordPress code and configuration for my EC2 instance. The other bucket, "calabs-wordpresscdn" we will use for our origin server, and this stores all the media files.

Now that we are familiar with the environment within which we will create our CloudFront web distribution, from the AWS Management Console select CloudFront and click the blue Create Distribution button to start the wizard. On the Select Delivery Method page, you have the option to create either a web distribution, from which you can distribute static or dynamic content, multimedia content, and even live event or an RTMP distribution to stream media files using Adobe Media Server and the Adobe Real-Time Messaging Protocol. Click the Get Started button for the web distribution.

The next step is to create the distribution. This is broken down into three sections, Origin Settings, Default Cache Behavior Settings, and Distribution Settings. There are a lot of settings, and we will go through each one of them so you are familiar with them. I'll also call out some key points to be aware of.

The first value is Origin Domain Name. This is the DNS name of the S3 bucket or the HTTP server. If the origin is an S3 bucket, then click in the field box and select it from the drop-down list. Please note that if you are using an S3 as a static website, do not select the bucket from the drop-down list. You must enter the static hosting website endpoint. If you're using a bucket from a different AWS account, then you'll need to type the name using the format "bucket-name.s3.amazonaws.com". If the origin is an HTTP server, then enter the domain name and ensure that the files are publicly readable. We will select calabs-wordpresscdn from the drop-down menu.

Origin Path allows you to request content from a directory in your origin by entering the directory path beginning with a forward slash. CloudFront will append the directory path to the value you specified in the origin domain name. We will leave this blank and move on to the next value. Origin ID is a unique string that distinguishes the origin from other origins in the distribution. We'll leave this with its default value.

If your origin is an S3 bucket, you have the option to restrict access, which if selected will prevent users from going direct to S3. This is a key setting to enable if you're trying to restrict access to the content by using signed URLs or signed cookies. We'll select the Yes radio button. You'll notice that by selecting Yes, we have additional values that we need to define.

An Origin Access Identity is a CloudFront-specific account that allows CloudFront to access your restricted Amazon S3 objects. As we don't have any existing identities, we'll select the Create New Identity radio button, and in the comment field we'll enter "access-identity-wordpress". For the Grant Read Permissions on Bucket, we will select Yes, Update Bucket Policy. An important point to note here is that as we have specified Yes, CloudFront will update the bucket policy to grant the specified Origin Access Identity permissions to read the bucket. However, it will not remove any existing permissions in the bucket policy or on individual items. It's important that you review the bucket policy regardless of the value that you select to ensure permissions are appropriate.

Under the Default Cache Behavior Settings, you'll notice that the Path Pattern can be modified and is currently set to the default of *, which will forward all requests to the origin. We will cover path patterns later in the lesson.

The Viewer Protocol specifies the protocol that you want viewers to use to connect to your edge locations. For the purpose of this demonstration, we will allow users to access via HTTP or HTTPS. The Allowed HTTP Methods let you specify what HTTP methods you want CloudFront to process and forward to your origin.

One point to note here is that CloudFront will cache responses to GET and HEAD requests, and optionally OPTIONS, but doesn't cache responses to any of the other methods. If you choose any of the options other than GET or HEAD and are using an S3 bucket, you need to ensure that POST requests are supported in the S3 region.

Some PUT requests may require an additional header. From a security perspective, you should also ensure that you restrict access to your S3 bucket or custom origin to prevent users from performing operations that you don't want them to perform.

Cached HTTP methods automatically cache GET and HEAD, and you'll recall that I mentioned you could cache OPTIONS if you selected that in the Allowed HTTP Methods. You can choose to enable it or not.

Forward Headers defines whether you want to forward the headers in the request and to then cache the objects based on the header values. CloudFront doesn't consider headers when caching your objects in edge locations. In this case, if your origin returns two objects and they differ only by the values in their request headers, CloudFront caches only one version of the object. If you select Whitelist, as we're using S3 as our origin, we only have the options of Access Control Request Headers, Access Control Request Method, and Origin Headers. If we were using a custom origin, we could use any headers, except Encoding, Connection, Cookie, Proxy Authorization, TE, and Upgrade. We'll leave the default setting of None.

For Object Caching, we can use the Origin Cache headers if our origin server is adding a Cache Control header to our objects. If you want to control how long objects stay in the cache regardless of any settings in the Cache Control header, then select the Customize radio button. You need to specify the minimum TTL, default TTL, and maximum TTL. These values specify how long in seconds you want the objects to stay in the cache before another request is forwarded to the origin to see if the object has been updated. The minimum TTL is 0 seconds, default TTL is 86,400 seconds which is one day, and the maximum TTL is 31,536,000 seconds which is one year.

Forward Cookies defines the action to take with cookies. You can specify None. For Whitelist, you can enter the names of the cookies and you can use the asterisk or question mark wildcard. As we're using S3 as our origin and this doesn't process cookies, we will leave it as None. In Forward Query Strings, if the origin server will return a different version of the object based on the query string in the URL, select Yes. But as we are not using this, we will leave the default value of No. Smooth Streaming uses the Microsoft Smooth Streaming format for the live streaming. We will leave this as a default value of No.

Finally, in this section we can define whether we want CloudFront to require users who access our content to use a signed URL or a signed cookie. Under Distribution Settings, the first value that we need to specify is the Price Class. You'll recall from the introduction that there are three price classes that define the edge location's regions. For this demonstration, we will use all edge locations, as that will give us the best performance.

Next, if you have a custom domain and want to use that for the objects instead of CloudFront.net, enter them in the field. If you specify a CNAME, you'll also need to update your DNS service to route queries to the respective CloudFront.net address. As this is a demo, I want to show you that the content is being delivered so we will leave this blank.

Next, for users to use HTTPS we will need to choose a certificate. As we have specified earlier, since we will support HTTP as well as HTTPS, we will accept the default and go with the CloudFront Certificate. The default route object is what you want CloudFront to return if the user queries the route URL of your distribution instead of a specific object. By setting a default route object, you avoid exposing the contents of your distribution. When you enter the value, don't add the leading forward slash. As we're using a WordPress site, we will enter "index.php".

Next, we have the option of enabling logging. We need to specify the S3 bucket and an optional log prefix, as well as specifying whether we want to enable cookie logging. There is no cost to enable logging, but you will be billed the usual S3 charges for storage and accessing. It is a best practice to enable logging, but for the purposes of this demo I will not enable logging.

There is a comment field where you can enter up to 128 characters of free text describing this distribution. Finally, we need to specify the distribution state once it's deployed. The key difference between the two states is whether or not it will respond to client requests. You can toggle this setting on or off as required. We will leave it as enabled, and then click the blue Create Distribution button. This can take several minutes to create, and I will pause the video until it has been fully deployed at which point we'll come back and verify it's working correctly.

It's been several minutes. If we click on Distributions, you can see that the status of our web distribution is deployed. If we have a look at the distribution settings, there are a number of tabs that we will quickly explore.

The first is General, which provides an overview of our distribution. What's important here is the domain name. I will use this to set the redirection rules for the WordPress site so that all media will be distributed from CloudFront, something that we will test later.

The next tab is origins, which lists the origin we previous created. We can either add additional origins or modify the current origin.

The next tab is Behaviors. You will recall that when we set up the distribution, we couldn't specify a custom path pattern. Now that we have created our distribution, we can create additional path patterns. For example, "images/*.jpeg", which will direct all queries for JPEG files to traverse the images directory. A point to note is to ensure that you order the behaviors in the order that you want CloudFront to evaluate them.

The next tab is Error Pages. Here you can set up custom error pages for when the origin returns either a 400 or 500 series status code.

The next tab is Restrictions. You can prevent users from specific countries from accessing the content either via a whitelist which specifies countries that can access, or via a blacklist which specifies countries that cannot access content.

The final tab is Invalidations, which removes objects from the CloudFront edge cache before they expire. Later, we'll discuss some best practices for caching.

Now that our distribution is live and I have modified the .ht access file to redirect media to CloudFront, let's add an image to the blog and check that it's working correctly. I'll navigate the WordPress site to http://cloudacademymylabs1.com, add a new page, and then edit the default post and insert a media file. If we go and view the page, and then open the image in a new tab, you'll notice that the URL is the CloudFront distribution we just created.

About the Author
Learning Paths

David's acknowledged hands on experience in the IT industry has seen him speak at international conferences, operate in presales environments and conduct actual design and delivery services.

David also has extensive experience in delivery operations. David has worked in the financial, mining, state government, federal government and public sectors across Asia Pacific and the US