Neptune Web, Inc. logo

Using Amazon CloudFront to Improve Global Web Site Performance

Update: Oct. 16, 2013, Amazon Announced "POST" support for CloudFront. See https://forums.aws.amazon.com/ann.jspa?annID=2179
UpdateOn June 12, 2013 Amazon announced  "CloudFront Custom SSL Certificates and Zone Apex". Now you no longer have to change the domain of your SSL site (point 2 in checklist below) and you do not have to treat canonical domains in a special way (point 9 in checklist below). This is a significant improvement since I wrote this post in March. I've put asterisks next to the points below, so as not to change the original post. 
​
Neptune recently migrated a large, multilingual, international web site to Amazon CloudFront.

It's not perfect. But Amazon CloudFront is a service we'd definitely recommend to our clients. Here's why:
  1. Improved website response time - giving a faster, slicker web experience for all users.
  2. Improved website response time specifically for international web users.
  3. Low-cost to configure compared to adding additional infrastructure.
  4. Allows you to keep your existing hosting infrastructure, c/o "Custom Origin" option.
  5. No need to change URLs (assuming "custom domain" is configured).
  6. Nearly unlimited "bursting" traffic capacity without having to setup new infrastructure.
  7. You retain complete control over your DNS. (A competing service, CloudFlare, requires domain control.)
  8. A better performing site may increase search traffic and sales.
  9. Minimal commitment.
Amazon CloudFront is a fairly new cloud-based service. Competitors include CloudFlare and Akamai.The service places geographically located, HTTP caches "in front" of your existing site, caching or proxying both static and dynamic content. CloudFront is like having an HTTP cache (examples include Squid or Varnish) in most major cities of the world. It also includes a dynamic DNS system to request users to the nearest "Edge" location based on their DNS resolver's IP. When requests require no caching, the request is passed (or proxied) back to the origin server.

Since both static and dynamic content is served, the changes to your site are theoretically quite minimal. In reality, this depends on how dynamic and complex your site is. Here is a checklist of things that may need to change:

Amazon Cloud Front Migration Checklist


1. POST data cannot be passed through CloudFront.

This is probably the most difficult one. If  you have forms that require POST (and can't be easily converted to GET), I recommend setting up a new domain name for your site where forms can be posted. If your original site was "www.acme.com", you'll need to set up a new domain such as "origin.acme.com" (this can be any domain you choose, such as "secure.acme.com", "post.acme.com" etc.). You'll need to change all form actions to POST to this URL or update links to forms to go to the origin site. Once the form is complete, I recommend redirecting the user back to the www site to make use of CloudFront.

2. You'll need to change your domain name for your SSL site. * (this no longer applies as of 6/12/2013, see update at top of page)

Another tough one. If you were hosting https://www.acme.com, you will need to purchase a new certificate to https://origin.acme.com. You can host your SSL content at https://d1dkq6joi5aul.cloudfront.net (the distribution URL), but you probably don't want a URL which looks like that. Don't forget that https://www.acme.com links may be linked all over the Internet in the form of external blog links and comments.

DNS for http:// and https:// can only point to once location. Once you make the switch - https://www.acme.com will print out a nasty "This Connection is Untrusted" for all users if you have not completely disabled it.

3. You'll need to carefully modify the HTTP cache-control, Expires and Last-Modified headers on your existing pages.

When I first started researching CloudFront, I was under the impression that setting the TTL within the "Behaviors" would mean I didn't have to modify  headers on my site. This is not the case. You need to become an expert on these 3 headers and gain complete control over what your existing pages use. I found the TTLs that Amazon provides to be fairly useless. I was a bit disappointed that I can't use the simple web interface to adjust caching reactively during high-traffic times. I have to make a programming change to do this.

First of all, you'll want to give all static content a far-future expires header. e.g. I typically do this in a global Apache rule.

# force caching for more speed of static content
<FilesMatch "\.(ico|pdf|flv|jpg|jpeg|png|gif|js|css|swf)$">
Header set Expires "Thu, 15 Apr 2020 20:00:00 GMT"
</FilesMatch>

That's easy. Pages are more difficult. (This example uses PHP. Other platforms will be similar.)

First, I added an included file at the top of all pages. 

PHP file

For pages I wanted to cache for only a few minutes, I include cache-control.php as below. Notice that I only modify the caching if the User Agent is Amazon CloudFront. This ensures that my existing site doesn't break.

Cache control code

For pages that I never want to cache, I call session_start() in my PHP. Most of my dynamic pages happen to do this anyway, and this gives me the default Expires and cache-control headers which prevent all caching. Of course, these headers can be set using "header()" if you don't need the overhead of session_start().

On a page which should never cache, CloudFront gets the following headers:

Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0

You'll always see a "Miss from cloudfront" in the "X-Cache" header of these hits.

On a page which should cache for maximum of 120 seconds, CloudFront receives these headers:

Expires: Sun, 24 Mar 2013 02:48:30 GMT
Cache-Control: public, max-age=120
Last-Modified: Sun, 24 Mar 2013 02:46:30 GMT

I can then test my headers using wget, using the options -S (show headers) and --header (set user-agent). e.g.

wget -S http://www.acme.com --header="User-Agent: Amazon CloudFront" 

I highly recommend starting out with very limited caching of your page. You don't want to deploy this; feel like a hero because your site is faster; and then slowly watch the bugs start coming in as you start desparately rolling back caching. These bugs are particularly hard to track down because no one notices them at first.

4.  IP addresses and User Agents will no longer be present to your site and will not be found in logs.

All user agent strings will come in as "Amazon CloudFront". IP addresses will not be your user's IP addresses. This  may require programming changes if your content is location based.

Log based web statistics will no longer be accurate. You should expect these statistics to change dramatically anyway since so much traffic is off of your server.

5. If you set up an additional domain, such as origin.acme.com, you'll need to track sessions across sub-domains.

This is easy in PHP.

# for cloudfront integration and use of origin.acme.com
php_value session.cookie_domain acme.com

6. If you set up an additional domain, you'll need to make sure Google analytics tracks both sub-domains as one.

Just use:

_gaq.push(['_setDomainName', 'acme.com']);

7. Social Media sharing links may need to be configured differently if a second domain is used.

8. You will need to make DNS changes.

Simply CNAME your domain to the distribution domain provided by Amazon.

9. Canonical redirects (www.) redirects will not work through CloudFront. * (this no longer applies as of 6/12/2013, see update at top of page)

For SEO purposes, the "base domain" e.g. "acme.com" usually is 301 redirected to the www. domain. My approach was to use www. site for CloudFront, but leave the "acme.com" site at the origin IP. When users access "acme.com" they are redirected back to www.acme.com by the origin server.
 


Troubleshooting

You will inevitably find yourself troubleshooting to figure out why something did or did not cache. Here are some tips.

1. Age header.

Watch the headers from responses in Firefox Firebug or Chrome Network view. The "Age" header will tell you how old the content is.

2. X-Cache header.

Will tell you whether CloudFront hit or missed the cache.

3. Amazon support provided this tip. Trace-routing CloudFront domain first.

traceroute d1dkq56j3333dl.cloudfront.net 

This allows you to identify the major geographic location your content will be fetched from. For example, the traceroute shows this request is going to the France data center (d1dkq56joi5aul.fra6.cloudfront.net).

Next, use curl with Host option  to see what is returned from that edge location.

curl -I -H "Host: d1dkq56j3333dl.cloudfront.net"

 4. Develop a script you can host in multiple geographical locations, which fetches URLs from edge locations it finds.

I've included the version we used - remote_test_cloudfront.php (zip, GNU licensed). This script can be invaluable when testing a site as seen from multiple locations.
 


Peeves

Here are a few issues I found with CloudFront.

  1. When invalidating content - "/" is not the same as /index.html - even if you specify your "Default Root Object" to be "index.html" with Distribution settings.
  2. Documentation is thin and not that clearly written.
  3. Configuration options are very basic.
  4. No ability to accurately change length of caching via CloudFront interface - requires technical changes to headers in site.

Yet, despite the "peeves", this is a really useful service.

Best of luck with your CloudFront migration. As always let us know if you'd like us to assist.  

Neptune Web is a full-service Boston-area interactive web and digital marketing agency with expertise in Website Design, Web Development, Digital Marketing Strategy and Execution.

We look forward to your comments and would be most happy to address and help solve any Digital Marketing or Website Design & Development challenges you may have.

comments powered by Disqus