Where developers come to connect, share, build and be inspired.

82

Nginx direct file upload without passing them through backend

25680 views

It's pretty straightforward to manage file upload. Everybody can do it with using multipart/form-data encoding RFC 1867. Let's see what happens:

  • client sends POST request with the file content in BODY
  • webserver accepts the request and initiates data transfer (or returns error 413 if the file size is exceed the limit)
  • webserver starts to populate buffers (depends on file and buffers size), store it on disk and send it via socket/network to back-end
  • back-end verifies the authentication (take a look, once file is uploaded)
  • back-end reads the file and cuts few headers Content-Disposition, Content-Type, stores it on disk again
  • back-end performs all you need to do with the file

Too much overhead? It happens all the time you upload something. The problems are obvious:

  • authentication happens on back-end after the file being saved on disk by webserver
  • the BODY request saves on disk twice (on web-server and back-end sides both)
  • back-end blocks while eating your file
  • resulted binary-data rarely required by back-end itself, because images usually use by Imagemagic, documents upload on S3 or something else

To be honest I can see no problem due to small file size upload. But what if you handle big files upload all the time? Let's assume you use Nginx web-server, so you have several options:

The best and production-ready solution is the last one, clientbodyinfileonly. Due to lack of documentation nobody uses it, but let me share with experience how to setup it. First of all you need to use premature authentication before file uploading is started - Basic HTTP Authentication (shared password) or httpauthrequest module (for back-end authentication through headers). Then update nginx configuration with the following config:

location /upload {
  auth_basic                 "Restricted Upload";
  auth_basic_user_file       basic.htpasswd;
  limit_except POST          { deny all; }

  client_body_temp_path      /tmp/;
  client_body_in_file_only   on;
  client_body_buffer_size    128K;
  client_max_body_size       1000M;

  proxy_pass_request_headers on;
  proxy_set_header           X-FILE $request_body_file; 
  proxy_set_body             off;
  proxy_redirect             off;
  proxy_pass                 http://backend/file;
}

Once you reload nginx, the new URL /upload is ready to accept file upload without any back-end interaction, it all goes through nginx and send callback to http://backend/file with file name in X-FILE header. It's all, easy?

You already know the file name before you make POST request, so you should preserve it until the back-end receive it. We do use extra headers with POST that pass through Nginx proxy and comes to back-end unmodified. For instance, having X-NAME headers from initial requests help you to catch it up on backend.

If you need to have back-end authentication, only way to handle is to use auth_request, for instance:

location = /upload {
  auth_request               /upload/authenticate;
  ...
}

location = /upload/authenticate {
  internal;
  proxy_set_body             off;
  proxy_pass                 http://backend;
}

Upload request should come with headers to be validated, for instance X-API-KEY, once authentication is finished, Nginx started to file uploading and pass the file name to backend afterward. It's internal cascade of requests, so you have to do only one request with file BODY and authentication headers. The good news that auth_request module will be incorporated in the Nginx core soon, so we can use it without ./configure ... --add-module=/tmp/ngxhttpauth_request

P.S. clientbodyinfileonly incompatible with multi-part data upload, so you can use it via XMLHttpRequest2 (without multi-part) and binary data upload only

curl --data-binary '@file' http://localhost/upload

This method is prefer to use with native mobile applications that handle big file upload all the time.

Comments

  • Avatar
    mikhailov

    @dawncold because clientbodyinfileonly doesn't support RFC 2388

  • Blank-mugshot
    craigloftus

    I found this very useful bit hit a couple of problems with it.

    First, by default nginx was configured to store the files to /tmp/ as a different user to that I had running the proxy processes. Editing the user directive in /etc/nginx/nginx.conf was my solution to this.

    Second, without adding "proxy_set_header Content-Length 0;" the response from my proxy just 'hung' open and had to be manually closed. This is a little confusing, but it seems to work for me :)

  • Blank-mugshot
    dawncold

    why only upload file successfully by using curl --data-binary?

    I have tried normal form upload, but the temp file is form field name and value?

  • Blank-mugshot
    niko

    NGinx will only start passing the file to the backend when the upload is complete. So at least there won't be a backend worker blocked while the client is still sending the data. And your setup will only work with NGinx and the backend having access to the same filesystem. Other than that: nice trick! Thanks!

  • Avatar
    mikhailov

    @niko,

    1) it's not right, Nginx only pass the filename instead of full file body to backend. So the backend should not parse and cut the Content-Disposition headers.

    2) back-end is not blocked until the file is uploaded by Nginx in any case

    3) Nginx and back-end should have the same filesystem, it's right.

  • Blank-mugshot
    eppo

    I tried your solution, upload works great but the tmp file gets stored as /tmp/00000x (x is a digit), and like the upload is async compared to the rest of the form that has already been saved as a "resource", how to you know what uploaded file belongs to what resource ? How can you know in advance where it's going to store it ?

  • Avatar
    mikhailov

    @eppo back-end receives a request to URL http://backend/file with empty body(!) and the file name is header X-FILE. The storage location is declared by clientbodytemp_path

  • Blank-mugshot
    eppo

    Ok I didn't notice that the upload request from the client is passed to the backend when the upload finishes within the same http request. So you are sure you're processing this client request. That's what I was wondering. Thanks for your tip, really handy

  • Avatar
    mikhailov

    @eppo yes, this is the callback, it fires only if file is uploaded and saved on disk successfully.

  • Blank-mugshot
    dawncold

    How to pass file path as GET or POST variable (instead of the header X-File) in the request to the back-end?

  • Avatar
    mikhailov

    @dawncold you are able to send desirable file name in extra header and reuse it on back-end afterwards

  • Blank-mugshot
    dawncold

    @mikhailov, I have tried to making a URL like : http://back-end/file?name=xxx&path=$requestbodyfile, but I can't get the value of $requestbodyfile, and this variable's value only can be set in header. I don't know why.

  • Avatar
    mikhailov

    @dawncold We do use extra headers with POST that pass through Nginx proxy and comes to back-end unmodified. So try to use X-NAME headers from initial requests and you will catch it up on backend.

  • Blank-mugshot
    goace

    Hi mikhailov, here's my problem: I'm using reverse proxy with Nginx. When I POST a file to the Nginx, it seems that it will store the whole file in local and forward it to the backend server after received the whole file. I want a solution to make Nginx receive & forward data synchronously.

    Can clientbodyinfileonly do this?

    PS: proxy and backend are different servers.

  • Avatar
    mikhailov

    @goace, the file stores to the local file system that's right. Once it has been uploaded the backend got synchronous callback (with X-FILE header and empty BODY) to the any URL you specified (http://backend/file in my case)

  • Chaka_normal
    xfrf

    Saved files contain also the headers — is this the supposed behavior or am I doing something wrong? I.e.:

    Content-Disposition: form-data; name="liteUploader_id"
    fileUpload1
    
    -----------------------------14064867571470422370146962914
    Content-Disposition: form-data; name="custom"
    tester
    
    -----------------------------14064867571470422370146962914
    Content-Disposition: form-data; name="fileUpload1[]"; filename="Gibson SG.jpg"
    
    Content-Type: image/jpeg
    

    ...and then the binary image data follows.

  • Avatar
    mikhailov

    @xfrf how do you upload a file?

  • Chaka_normal
    xfrf

    @mikhailov our fault, already fixed it all. Thank you for the method description!

  • Blank-mugshot
    alekseyp

    @xfrf, what was the problem?

  • 35c94c2ce389079be76823a44c277c98_normal
    meson10

    I have a similar problem with the file output containing

    ------WebKitFormBoundaryvG6nluJ9VUrYg1BK Content-Disposition: form-data; name="fileToUpload"; filename="Screen Shot 2013-09-16 at 8.48.45 AM.png" Content-Type: image/png

    How do I trim this data ? More importantly capture Content-Type as a Request Header perhaps something like X-Content-Type ?

  • Avatar
    mikhailov

    @meson10 change the way you upload the file, get rid of multipart/form-data

  • 35c94c2ce389079be76823a44c277c98_normal
    meson10

    How do I access the Request parameters then ?

  • Avatar
    mikhailov

    through custom headers that Nginx preserve, they come with original request

  • 35c94c2ce389079be76823a44c277c98_normal
    meson10

    My Bad. Not just content-type I am sending a bunch of request parameters too like:

    ------WebKitFormBoundaryXvqKozZ4exuycMpX Content-Disposition: form-data; name="maxlength"

    102400

    These would obviously not be a part of the Headers, but of the request body. Which would require some pruning of the file saved. Correct, Or am I missing a Trick here ?

    (I am relatively new to DevOps and Advanced Nginx tricks, pardon my naiveness with the concepts.)

  • Blank-mugshot
    kliuchnikau

    Thanks for sharing this approach! I have one small question: When nginx stores file in /tmp/ directory it sets file access to 'rw' for file owner only (and owner is nginx user, say 'www-data'). Then I need to use this file from backend process which runs under a separate user (say, 'deployer'). I wonder, how do you deal with this case?

  • Avatar
    mikhailov

    @kliuchnikau in case you have separate deployment user web-server may need to get the access to app tmp directory at least. So the approach is to run webserver under deployer user, it should be ok if you can control the application itself.

  • Blank-mugshot
    2naive

    Анатолий, добрый вечер.

    Спасибо за подробный how-to.

    Не подскажите, правильно ли я понимаю, что нельзя влиять на именование файлов, которые будут записываться в upload? Если мы не берём возможность level 1-3?

    Т.е. я не могу именовать файлы, например, вместо [\d]{10} как [\w]{6} ?

    Спасибо.

  • Avatar
    mikhailov

    @2naive, в рассылке были вопросы по этому поводу, но, насколько я в курсе изменений, этот модуль не трогали (лишь добавили auth_request по просьбам), поэтому на имя файла влиять пока нельзя.

  • Blank-mugshot
    2naive

    Спасибо за ответ.

    И ещё один глупый вопрос - могу ли я быть уверенным, что при восстановлении папки на новой площадке из бекапа с такими же настройками - восстановленные файлы не будут перезаписаны nginx'ом (или не будут записаны новые по причине наличия файла с таким же идентификатором)?

    Спасибо.

  • Blank-mugshot
    sammaye

    For some reason on nignx 1.4.4 on Ubuntu using the default built in Uploader with nothing more than a max client body side set to 1GB I can get both PHP upload progress working and multipart with the $_FILES array being populated :S.

    Not sure how that is but it is.

  • 0_fspnurb4mxc9_4j2tynduzfs2flftoe2dobduzz61ga9ysxu_y_sz4thgw5jgjwhamlen0fjrfig
    pointblank

    Hello Mikhailov,

    is it possible to get the original uploaded file name? The X-File header only has the temp file name "/tmp/uploads/0000000001"

  • Avatar
    mikhailov

    @pointblank, I don't think it's possible because Nginx has its own internal naming conventions for body request content

  • 0_xuxlrh3cbxdj_eydqpjnrwhmbfiz_mmd6ywsrwr5zg0xyomseg7dvikwwwwihw73lzgrq2a7omuj
    mandrei99

    Hi,

    Nginx upload module is supported in 1.4.4.

  • Avatar
    mikhailov

    @mandrei99 this plugin author's attitude is pretty clear https://github.com/vkholodkov/nginx-upload-module/issues/41

    It's a bit dangerous to rely on patches that can break and make core functionality of Nginx and upload-module unstable

  • Avatar
    mikhailov

    @mandrei99 once you are start using non-supported plugins, it stops you from being on the latest stable Nginx version. New release is out, you are waiting for new patch again. For example SPDY/2 support will be discontinuing soon, so Nginx 1.5.10 is a must.

    We decided do not do that. Just using built-in functionality and develop a service on top of it. Core functionality is enough for our tasks.

  • Me_normal
    ronnyf

    files created by nginx: 2014/02/26 21:47:23 [notice] 4533#0: *1 a client request body is buffered to a temporary file /var/www/staging/0000000001, client: 127.0.0.1, server:.com, request: "POST /upload HTTP/1.1", host: "localhost"

    created file has owner nobody and very restrictive permissions -rw------- 1 nobody admin 140257 26 Feb 21:47 0000000001

    I'd like to read the file and process it's contents in the backend but I can't seem to figure out how to tell nginx to use a different umask (022) for the files it creates. Can anybody help me please?

    The exception I get in the backend: java.nio.file.AccessDeniedException: /tmp/0055119830 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)

  • Avatar
    mikhailov

    @ronnyf what is the user and group you have in Nginx configuration file? I've checked the files created with the user you specified. If that does not help, please take a look at http://en.wikipedia.org/wiki/Setuid#setgid_on_directories

  • Me_normal
    ronnyf

    @mikhailov thanks, I synchronized the nginx user and the backend user to be the same - works alright.

  • Blank-mugshot
    vaggos2002

    @mikhailov many thanks, but how about this part '- back-end reads the file and cuts few headers Content-Disposition, Content-Type, stores it on disk again' ??

    How can you cut the headers ?? can you use the uploaded file (e.g. 00000002) as a string ?

  • Avatar
    mikhailov

    @vaggos2002 please look at RFC 1867. You can upload files one of either way: multipart form data or as a binary.

  • Blank-mugshot
    vaggos2002

    @mikhailov thanks a lot

Add a comment