Where developers come to connect, share, build and be inspired.

61

Nginx as proxy for Amazon S3 public/private files

10901 views

Everybody who uses S3 as a file storage knows it as a scalable production solution.

Pros:

  • High SLA, 3 copies of each object
  • Public HTTP/HTTPS URLs
  • Pivate HTTPS URLs with short-time live
  • Versioning and more other cool features

Cons:

  • Narrow TCP Cognestion Window
  • Long SSL hand-shake, no SPDY support
  • Wide IP range, hard to fit into firewall rules
  • Difficult to use direct URLs with custom access control

I'd like to share the solution Nginx S3 proxy with the following features:

  • URL masking (Stealth mode)
  • SSL terminate (makes sense with conjuction to EC2 in the same zone)
  • Allow to use nginx proxy cache
  • SPDY support, your own TCP Cognestion Window

PUBLIC files:

location ~* ^/proxy_public_file/(.*) {
  set $s3_bucket        'your_bucket.s3.amazonaws.com';
  set $url_full         '$1';

  proxy_http_version     1.1;
  proxy_set_header       Host $s3_bucket;
  proxy_set_header       Authorization '';
  proxy_hide_header      x-amz-id-2;
  proxy_hide_header      x-amz-request-id;
  proxy_hide_header      Set-Cookie;
  proxy_ignore_headers   "Set-Cookie";
  proxy_buffering        off;
  proxy_intercept_errors on;

  resolver               8.8.4.4 8.8.8.8 valid=300s;
  resolver_timeout       10s;

  proxy_pass             http://$s3_bucket/$url_full;
}

EXAMPLE:

https://your_server/proxy_public_file/readme.txt

To proxy the S3 pre-signed URL you should use AWS SDK or similar to generate the URL first. If you have server with Nginx in the same zone as S3 bucket just use SSL terminator via generating pre-signed URL with HTTP. It will be served by HTTPS via Nginx then.

Once you have the URL, bucket name and AWSAccessKeyId can be ommitted, because you already have them in Nginx configuration variables. The target is to mask URL as much as possible. Don't worry about AWSAccessKeyId, it doesn't make sense without Secret Key. Let's masking the URL by pass Expires timestamp as e and Signature as st. For instance the basis URL is https://yourbucket.s3.amazonaws.com/readme.txt?AWSAccessKeyId=YOURONLYACCESSKEY&Signature=sagw4gsafdhsd&Expires=3453445231

PRIVATE files:

location ~* ^/proxy_private_file/(.*) {
  set $s3_bucket        'your_bucket.s3.amazonaws.com';
  set $aws_access_key   'AWSAccessKeyId=YOUR_ONLY_ACCESS_KEY';
  set $url_expires      'Expires=$arg_e';
  set $url_signature    'Signature=$arg_st';
  set $url_full         '$1?$aws_access_key&$url_expires&$url_signature';

  proxy_http_version     1.1;
  proxy_set_header       Host $s3_bucket;
  proxy_set_header       Authorization '';
  proxy_hide_header      x-amz-id-2;
  proxy_hide_header      x-amz-request-id;
  proxy_hide_header      Set-Cookie;
  proxy_ignore_headers   "Set-Cookie";
  proxy_buffering        off;
  proxy_intercept_errors on;

  resolver               8.8.4.4 8.8.8.8 valid=300s;
  resolver_timeout       10s;

  proxy_pass             http://$s3_bucket/$url_full;  
}

EXAMPLE:

https://your_server/proxy_private_file/readme.txt?st=sagw4gsafdhsd&e=3453445231

UPDATE 1:

When your clients and servers are both in the same geolocation, the speed of delivery is not limited much. But what about getting an access to S3 file storage through another region by specific business requirements? When Nginx S3 proxy makes really sense?

The common bottlenecks through the file access speed from another location: 1) too many hops through the transport layer, 2) high number of round trips because of TCP cognestion, 3) frequently SSL session setup per each file

Let's take the single file and try to use S3 proxy. The simple downloading script by wget can show us the effect. To use S3 proxy in the same region helps you to masking URL primarily, but with using different regions (between file's storage and user's location) it produces a huge difference. The file locates in US (S3 bucket and Nginx proxy), but wget client is in EU.

wget -v --no-check-certificate 'https://your_bucket.s3.amazonaws.com/file.txt?e=1374604566&st=G2lLQgG1d8L5irmYbkJi2hABT3D' => 1.55M/s   in 6.1s

wget -v --no-check-certificate 'https://us.ec2.instance/proxy_private_file/file.txt?e=1374604566&st=G2lLQgG1d8L5irmYbkJi2hABT3D' => 2.40M/s   in 3.8s

Nice results: 6.1s -> 3.8s. By using optimized TCP settings and SSL offload the downloading time broke down twice for the same file. SSL session caching helps to reduce download time for consequently files.

UPDATE 2:

To setup Nginx S3 proxy properly you need to adjust AWS S3 endpoints, they can be found here: http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

UPDATE 3:

Caching is easy to setup, just enable proxy_buffering and ajust location for cache, see all details here: https://gist.github.com/mikhailov/9639593

Comments

  • 116-music
    cbess

    @mikhailov Correct, S3 is not a CDN. However, my point was that you reduce the scalability by having all traffic diverge to your server. Meaning, you take on the entire load of S3 traffic, thereby sidestepping its ability to scale requests. Aren't you taking the hit for bandwidth and CPU load to serve through nginx? Nginx has to serve and buffer the request to the client. Thanks for the clarification.

  • Avatar
    mikhailov

    @cbess this solution is not for everybody, but for specific requirements. Please re-read the post again. Sure, proxy doubles the traffic and use CPU, in terms of scalability it solves by using array of Nginx servers.

  • 116-music
    cbess

    Doesn't this mean that you have reduced/removed the scalability or CDN features of S3? Since all requests now go through the nginx server. You are effectively using S3 to store the files, then piping them through this nginx server.

  • Avatar
    mikhailov

    @cbess CloudFrount CDN is not feature of S3! CloudFront can use S3 just as origin. For some reason CDN is not an option for certain business purposes, such data privacy, specific geo-location (if should not be replicated to another regions).

Add a comment