rlguog
53.42K
· June 2013 ·
Screen shot 2016 02 29 at 20.33.18

Nginx as proxy for Amazon S3 public/private files

Everybody who uses S3 as a file storage knows it as a scalable production solution.

Pros:

  • High SLA, 3 copies of each object
  • Public HTTP/HTTPS URLs
  • Pivate HTTPS URLs with short-time live
  • Versioning and more other cool features

Cons:

  • Narrow TCP Cognestion Window
  • Long SSL hand-shake, no SPDY support
  • Wide IP range, hard to fit into firewall rules
  • Difficult to use direct URLs with custom access control

I'd like to share the solution Nginx S3 proxy with the following features:

  • URL masking (Stealth mode)
  • SSL terminate (makes sense with conjuction to EC2 in the same zone)
  • Allow to use nginx proxy cache
  • SPDY support, your own TCP Cognestion Window

PUBLIC files:

location ~* ^/proxy_public_file/(.*) {
  set $s3_bucket        'your_bucket.s3.amazonaws.com';
  set $url_full         '$1';

  proxy_http_version     1.1;
  proxy_set_header       Host $s3_bucket;
  proxy_set_header       Authorization '';
  proxy_hide_header      x-amz-id-2;
  proxy_hide_header      x-amz-request-id;
  proxy_hide_header      Set-Cookie;
  proxy_ignore_headers   "Set-Cookie";
  proxy_buffering        off;
  proxy_intercept_errors on;

  resolver               172.16.0.23 valid=300s;
  resolver_timeout       10s;

  proxy_pass             http://$s3_bucket/$url_full;
}

EXAMPLE:

https://your_server/proxy_public_file/readme.txt

To proxy the S3 pre-signed URL you should use AWS SDK or similar to generate the URL first.
If you have server with Nginx in the same zone as S3 bucket just use SSL terminator via generating pre-signed URL with HTTP. It will be served by HTTPS via Nginx then.

Once you have the URL, bucket name and AWSAccessKeyId can be ommitted, because you already have them in Nginx configuration variables. The target is to mask URL as much as possible. Don't worry about AWSAccessKeyId, it doesn't make sense without Secret Key. Let's masking the URL by pass Expires timestamp as e and Signature as st. For instance the basis URL is https://your_bucket.s3.amazonaws.com/readme.txt?AWSAccessKeyId=YOUR_ONLY_ACCESS_KEY&Signature=sagw4gsafdhsd&Expires=3453445231

PRIVATE files:

location ~* ^/proxy_private_file/(.*) {
  set $s3_bucket        'your_bucket.s3.amazonaws.com';
  set $aws_access_key   'AWSAccessKeyId=YOUR_ONLY_ACCESS_KEY';
  set $url_expires      'Expires=$arg_e';
  set $url_signature    'Signature=$arg_st';
  set $url_full         '$1?$aws_access_key&$url_expires&$url_signature';

  proxy_http_version     1.1;
  proxy_set_header       Host $s3_bucket;
  proxy_set_header       Authorization '';
  proxy_hide_header      x-amz-id-2;
  proxy_hide_header      x-amz-request-id;
  proxy_hide_header      Set-Cookie;
  proxy_ignore_headers   "Set-Cookie";
  proxy_buffering        off;
  proxy_intercept_errors on;

  resolver               172.16.0.23 valid=300s;
  resolver_timeout       10s;

  proxy_pass             http://$s3_bucket/$url_full;  
}

EXAMPLE:

https://your_server/proxy_private_file/readme.txt?st=sagw4gsafdhsd&e=3453445231

UPDATE 1:

When your clients and servers are both in the same geolocation, the speed of delivery is not limited much. But what about getting an access to S3 file storage through another region by specific business requirements? When Nginx S3 proxy makes really sense?

The common bottlenecks through the file access speed from another location: 1) too many hops through the transport layer, 2) high number of round trips because of TCP cognestion, 3) frequently SSL session setup per each file

Let's take the single file and try to use S3 proxy. The simple downloading script by wget can show us the effect. To use S3 proxy in the same region helps you to masking URL primarily, but with using different regions (between file's storage and user's location) it produces a huge difference. The file locates in US (S3 bucket and Nginx proxy), but wget client is in EU.

wget -v --no-check-certificate 'https://your_bucket.s3.amazonaws.com/file.txt?e=1374604566&st=G2lLQgG1d8L5irmYbkJi2hABT3D' => 1.55M/s   in 6.1s

wget -v --no-check-certificate 'https://us.ec2.instance/proxy_private_file/file.txt?e=1374604566&st=G2lLQgG1d8L5irmYbkJi2hABT3D' => 2.40M/s   in 3.8s

Nice results: 6.1s -> 3.8s. By using optimized TCP settings and SSL offload the downloading time broke down twice for the same file. SSL session caching helps to reduce download time for consequently files.

UPDATE 2:

To setup Nginx S3 proxy properly you need to adjust AWS S3 endpoints, they can be found here: http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

UPDATE 3:

Caching is easy to setup, just enable proxy_buffering and ajust location for cache, see all details here: https://gist.github.com/mikhailov/9639593

UPDATE 4:
AWS NS Resolver 172.16.0.23 instead of Google's one 8.8.8.8

Sign in or sign up to add your response.

10 Responses

7268
116 avatar

Doesn't this mean that you have reduced/removed the scalability or CDN features of S3? Since all requests now go through the nginx server. You are effectively using S3 to store the files, then piping them through this nginx server.

over 1 year ago ·
7281
Screen shot 2016 02 29 at 20.33.18

@cbess CloudFrount CDN is not feature of S3! CloudFront can use S3 just as origin.
For some reason CDN is not an option for certain business purposes, such data privacy, specific geo-location (if should not be replicated to another regions).

over 1 year ago ·
7293
116 avatar

@mikhailov Correct, S3 is not a CDN. However, my point was that you reduce the scalability by having all traffic diverge to your server. Meaning, you take on the entire load of S3 traffic, thereby sidestepping its ability to scale requests. Aren't you taking the hit for bandwidth and CPU load to serve through nginx? Nginx has to serve and buffer the request to the client. Thanks for the clarification.

over 1 year ago ·
7294
Screen shot 2016 02 29 at 20.33.18

@cbess this solution is not for everybody, but for specific requirements. Please re-read the post again. Sure, proxy doubles the traffic and use CPU, in terms of scalability it solves by using array of Nginx servers.

over 1 year ago ·
17970
S0nidpct normal

@mikhailov Thanks for the post. Can you also please guide us as to how to write a custom proxy module for nginx instead of using the configuration. We need to do a check in database before proxy-ing to s3 for some security reasons. We do not want to serve all requests and rules are written in a database. So if a rule with the request does not match we throw a 404.

over 1 year ago ·
18005
Screen shot 2016 02 29 at 20.33.18

@debjitk my advice is do not write custom authentication mechanism, but using built-in functionality http://nginx.org/en/docs/http/ngx_http_auth_request_module.html

it helps you to do backend request (with following database rules detection) easily.

over 1 year ago ·
20732
None

Thanks for this, I used it with XSendfile which makes this even more awesome! If you have never used XSendfile (or X-Accel-Redirect as nginx calls it) it is worth a look. You can use it with proxy pass, so it makes it ideal for a setup like this.

over 1 year ago ·
21801
None

Hi
I have tried the public config and I see that it works on my local ec2 <--> s3 setup, however when I am trying to cache I don't see the files being saved locally on the ec2.
any Idea?

over 1 year ago ·
26265
None

Great article. One minor point - When you say "High SLA" I think you mean "High Redundancy" (S3 actually has NO SLA (Service level agreement) - It could down for a week and you'd have no legal recourse)

6 months ago ·
26609
None

Fix your code for future readers please

proxypass http://$s3bucket/$url_full;

is actually

proxypass http://$s3bucket$url_full;

or

proxypass http://$s3bucket$uri;

Nice article

5 months ago ·
Featured Programming Job

Platform Engineer
·
San Francisco
·
Full Time
Search all programming jobs