Optimized File Uploading With PHP & Nginx

folder_openNginx, PHP, Scalability, Technology
comment33 Comments

Performance is often important to people using nginx – and for good reason, of course.

Sadly, while many people will optimize their software stack they will rarely work on optimizing the back-end code; and even more rarely will they eliminate single points of failure.

Such was also the case when SitePoint recently published an article about uploading large files with PHP. This post will discuss a method to accept uploads that will scale far better and not offer malicious users an easy DoS vector.

The Problem

File uploads are used in many places, depending on your site people might be adding avatars, personal pictures, music or any other type of file. The size of uploads can vary a lot but in the end it doesn’t really matter much, you’re still offering a malicious user a single point of failure where he can direct his denial of service attack.

Allow me to illustrate. Lets say you have an upload form for people to upload pictures, you run Apache in pre-fork mode with mod_php and 50 max children, otherwise known as the standard Apache setup.

Each time Apache accepts an upload one of these processes is going to be busy for the duration of the upload. Do you see the problem here? File uploads to PHP are essentially really long-running scripts and you’re going to run out of Apache processes quickly. It might not even be a malicious user, you could be a victim of your own popularity.

If you’re using nginx then you’re already better off as nginx will buffer the file upload to disk and only pass it to your fastcgi back-end once the file upload is complete. If you’re uploading a 1 GB file nginx is still going to send 1 GB of data over fastcgi, though, but we can do something about that.

The Solution

Thankfully we are not without options and developing a scalable system for uploading files is not too difficult. To help us out we’ll use two third-party nginx modules – namely the upload module and upload progress module.

The upload module will handle the actual upload for us in nginx and when complete will pass the path of the file to PHP for us to know where the file is. This way PHP will not be waiting on the data to be sent but only decide what to do with the data once it’s on the server, this means PHP-wise your file upload can have a sub-second execution time, and at this point your bottleneck is going to be either your network or disk IO!

The upload progress module is fairly self-descriptive in that it will monitor and report the progress of uploads. It accepts a unique identifier with the form submission and when given this identifier will report the status of the upload. Simple.

The Execution

If you’ve never compiled nginx with a third-party module then you’ll be happy to know that it’s fairly simple. Download the source code, extract it and add the following configure option.

--add-module=/path/to/nginx-upload-module
--add-module=/path/to/nginx-upload-progress-module

make, make install and you’re ready to configure it.

The configuration is a bit more complex and might seem overwhelming at first, but it is fairly easy to comprehend given a few seconds thought.

http {
  upload_progress uploads 5m;

  server {
    # This just rewrites all requests to a front-controller for SEF URLs.
    location @frontcontroller {
      rewrite ^ /index.php last;
    }

    location = /progress {
      report_uploads uploads;
    }

    location /upload {
      # Pass altered request body to this location
      upload_pass   @frontcontroller;

      # Store files to this directory
      # The directory is hashed, subdirectories 0 1 2 3 4 5 6 7 8 9 should exist
      upload_store /var/tmp/fuploads 1;

      # Set the desired user permissions
      upload_store_access user:r group:r all:r;

      # Set specified fields in request body
      upload_set_form_field $upload_field_name.name "$upload_file_name";
      upload_set_form_field $upload_field_name.path "$upload_tmp_path";

      # Inform backend about hash and size of a file
      upload_aggregate_form_field $upload_field_name.sha1 "$upload_file_sha1";
      upload_aggregate_form_field $upload_field_name.size "$upload_file_size";

      # This directive specifies any extra POST fields which should be passed along.
      #upload_pass_form_field "^usession$";

      upload_cleanup 400 404 499 500-505;

      track_uploads uploads 5s;
    }
  }
}

All the directives are documented in the nginx wiki or on the module download page so I’m not going to go into too much detail about the configuration. The configuration here is stripped down and missing essential non-related directives but lets have a look at the important parts.

In the http block we allocate a memory buffer for the upload progress module to use for tracking, it does not need to be very large as it doesn’t store overly much info per upload, the 5 MB I have assigned it is probably overkill even though it’s used in a system handling many uploads.

The /progress location is defined as the URI we’ll use for reporting uploads tracked in the uploads buffer we defined earlier. At the very bottom of the configuration you can see that we have set the location /upload as the location for tracking uploads.

Conveniently, this is also the location that will handle the upload! In short, uploads are stored in /var/tmp/fuploads/x where x is between 0 and 9. Once done the module will pass it to the @frontcontroller named location which basically just rewrite the request to a PHP file. This will be the file that will handle the PHP end of the file upload. In this example my index.php file would have /upload/ as request URI and route the request the request to the proper controller, but how you handle it doesn’t really matter.

Putting It All Together

Right now you actually have a working setup. File uploads POSTed to /uploads will be handled and tracked by nginx so now it’s time to put this data to use by displaying a nice progress bar to the user. For this we will create a javascript-based uploader, it will degrade gracefully in case javsacript isn’t enabled, but in that case won’t support displaying a progress bar.

<form id="javascript-upload" action="/upload/" enctype="multipart/form-data" method="post">
  <label for="jfile">File Upload:
    <input id="jfile" name="file" type="file" />
  </label>
  <input type="submit" value="Upload File" />
</form>
<div style="border: 1px solid black; width: 300px;">
  <div id="status" style="background-color: #D3DCE3; width: 0px; height: 12px; margin: 1px;"></div>
</div>
<div>
  <span id="received"> </span>
  <span id="speed"> </span>
</div>

This is a fairly standard upload form. In addition we’ve defined a div for a progress bar and a few spans for information about received data and the upload speed. Now let’s have a look at the javascript required, this example uses MooTools but it’s much the same concept in native javascript, jquery or whatever you prefer.

$('javascript-upload').addEvent('submit', function(e) { // On submit of upload form.
  var received = 0;
  var percent  = 0.0;
  var perform;
  var periodical;
  var uuid = Math.floor(Math.random() * 16).toString(16); // Unique uploader ID
  var check = 2000; // Milliseconds between each XHR request.

  $('javascript-upload').action += '?X-Progress-ID=' + uuid; // Assign ID to upload.

  var request = new Request({ // Define the XHR request.
    url: '/progress?X-Progress-ID=' + uuid, // Using same identifier!
    method: 'get',
    link: 'cancel',
    onComplete: function(response) {
      var json = JSON.decode(response);
      if (json.state == 'uploading') {
        var delta = json.received - received;
        var bytes = delta / (check / 1000);
        received  = json.received;
        percent   = (json.received / json.size) * 100;

        $('status').tween('width', 298 * percent / 100);
        $('received').innerHTML = 'Received ' + Math.round(json.received / 1024) + '/' + Math.round(json.size / 1024) + ' KB';
        $('speed').innerHTML    = 'Speed ' + Math.round(bytes / 1024) + ' KB/s';

        if (percent >= 100) {
          $clear(periodical); // Upload done, stop polling Nginx.
        }
      }
    }
  });

  perform = function () {
    request.send();
  }

  periodical = perform.periodical(check);
});

I did my best to put in the required comments to make it understandable. But in short what we do is capture the submit event of the upload form and inject our own javascript code. It’s important to note that we do not return false or prevent the upload from taking place. We then define an XHR request to the /progress URI we configured earlier and provide it with the unique upload identifier. It will return data in JSON format which we can then parse and use to calculate how progress and upload speed. The 298 in the tween method call is the width of the progress bar (300) minus the margins (1 each).

So there you have it, scalable file uploading that won’t kill your back-end.

Drawbacks

Sadly, nothing is ever completely perfect. While the method the upload module uses by passing PHP the path to the uploaded file instead of the actual file data is much faster and a smarter concept, it does mean that we cannot use the standard back-end code. There will be no $_FILES array for us to use but rather we’ll get the data in $_POST. Using nginx by itself will make it scalable enough while providing the $_FILES array, but if you’re writing a custom application then the upload module can come in handy.

Related Posts

33 Comments. Leave new

  • This would make an awesome customization for SMF.

    Reply
    • I consider it pretty much essential for any application. PHP is slow enough and memory hungry enough without sending it huge files over the FastCGI protocol. The progress monitoring isn’t too shabby either.

      Reply
  • I am getting

    ngx_garbage_collector_temp_handlerâ undeclared (first use in this function)

    when I try to make

    source code mismatch maybe? any help in which versions work together?

    Reply
    • I run the latest version of the upload module with Nginx 0.9.5 but I’m 100% sure it also works with 0.8.54 at least.

      Reply
  • I gave a try to the Nginx upload module because I’m facing this problem : when I upload a large (2GB, for exemple) file on my Web site (Nginx, PHP/PHP-FPM), Nginx buffers the whole file in memory, and this will be come a huge problem as we’re going to have a lot of users uploading large files in a near future. So, I thought the Nginx upload module would fix this by outputting directly in its upload directory the input – but no, Nginx still buffers everything in memory.

    I’m currently looking at solutions like Plupload that can chunk an upload in many small files – but the perfection solution would be to be able to tell Nginx to write the client body directly to file, and never keep it in memory. Anyone has an idea how to do that?

    Reply
    • The Nginx upload module does not write to memory, it writes to the path you specify with the upload_store directive. For example if you have upload_store /tmp/uploads 1; then you will have a directory /tmp/uploads with 10 directories in it and the files will be written to there.

      Reply
      • Well when I upload a file, using the Nginx upload module or not, I can see the system memory (in top) get full. It’s only cache, it’s not locking the memory, but still it would be nice if it wouldn’t use memory at all and write directly (and only) to the disk. I think this might be because of the way the core of Nginx/FastCGI works and not because of the upload itself.

        Reply
        • I’m fairly sure that has to do with Linux caching IO. This is non-reserved memory meaning that it will be free’d up if it’s needed.

          You actually want your memory to be fully utilized.

          Reply
    • vetriselvan
      June 27, 2011 12:09

      mr mike ,

      this is nothing to do with nginx. i am uploading files via nginx .it s working well . it did nt take much memory.u have to throttle the speed of writing data into server if u want to make scalable server.nginx upload module will write data as it recieves.it wont hold in memory until it gets full file.

      Reply
  • Is that possible for nginx to pass the data chunk by chunk to the upstream when it received them without caching into disk? As if nginx is working like a tunnel between client and upstream.

    Reply
    • No. Nginx will always buffer the request, you can only chunk the response. You should not load balance uploads with nginx but rather just contact the individual upload servers – or use something other than nginx to load balance.

      Reply
  • Thank you very much sir, your article helped me to understand the concept and successfully implemented upload with progress bar. Now actually i cant understand how will i go to next page after upload is complete, as in your example action=”/upload/” it dont redirect me to /upload/frontcontroller.php but to /upload/?X-Progress-ID=6b4dd4f9254a23bd8305be958f0c2612 i am not sure how i can catch this stuff, second i am trying to upload an image file but it is saved as some hash like filename = 0000000001 , how can i convert it back .

    Thanks for help.

    Reply
  • Never mind for submit action it was returning to index.php

    location @frontcontroller {
    rewrite ^ /index.php last;
    }

    for second question, how would i go with returning 0000000001 file to its original name something?

    THanks

    Reply
    • nvm again solved it i can work with [file_path] it returns. any solution for multiple files upload? thx

      Reply
      • Sorry no, no advice on multiple file upload, I always just use flash or javascript to queue the upload and upload them one after the other, thus the backend actually only sees a single file upload.

        Reply
  • Mobin Hosseini
    February 24, 2012 15:37

    Well i was modifying someon´s code and i found on his nginx config this article´s url. The problem is many people just copy paste code without analyzing them well. For example i found exactly the above codes within a production server without any filtering and data handling. You could easily retrieve the sensitive info of the server. Here the author explains very well how people can implement this module. He cant imagine how the user is going to implement it so it would be a good idea for the users to check and validate pieces of codes found on the internet. Thats all from me 😀

    Reply
  • Thanks for the excellent article. Ive been looking into migrating a site from Apache 2.2 to Nginx. I had planned on using Nginx as both load balancer and upstream web server/application server (PHP). Having read this article, it highlights that I’ve not considered how file uploads would work in a load balanced setup. You mentioned one solution above, i.e. send upload requests directly to upstream web servers. Is this something achievable through Nginx config or do you simply designate one or more web servers for uploads and give them public facing IPs?

    Reply
    • You cannot do this through Nginx as by definition that would require the Nginx server to handle all the data going to the backends. This works great for small uploads on a limited scale but if you have to handle lots of uploads then you really want upload servers with public facing IPs. Whether or not these are your backend PHP servers is ultimately up to you. I prefer to keep uploads servers under a hostname as that makes it easier to move from uploading directly to PHP server to instead uploading to a dedicated upload server which then distributes it after that.

      Reply
  • I know this article is a couple of years old, but I just wanted to give it a +1. This article rocks. There are volumes that cover this problem ineffectually at best, and you’ve basically solved it in 1,000 or so words.
    I run virtual servers on the AWS cloud and I’ve been considering switching over to nginx recently. I’ve just been pushed over the edge. My lazy Sunday just turned into server-software-switching-sunday-funday

    Reply
  • Jean-Nicolas
    August 25, 2012 03:18

    I got the same problem…

    Just add this in the JavaScript:

    e.preventDefault();

    Just before:

    var received = 0

    For the last part of your question I can’t help you sorry.

    Reply
  • Jean-Nicolas
    August 25, 2012 08:52

    I am having some issues with your code. Here is my problem in more details:

    http://stackoverflow.com/questions/12114577/nginx-file-upload-progress-module-with-php-and-kohana

    Can you help? It would be truly appreciated. Thanks!

    Reply
  • […] catatan Martin Fjordvald membahas masalah ini “File Uploading With PHP & Nginx“, namun tidak serta merta membereskan masalah yang saya hadapi saat ini. Server saya saat ini […]

    Reply
  • Jonas Bülow
    January 24, 2013 07:54

    You should update the comment below to reflect the change you’ve made:

    # Allow uploaded files to be read only by user
    upload_store_access user:r group:r all:r;

    Reply
  • Hello, I’ve followed your article but it does not work as supposed.
    I am able to upload file, but the progress bar does not work, Nginx 1.2.4 [extras] with Upload module and Upload progress module.

    in my nginx.conf i’ve added upload_progress uploads 5m; and the rest is also the same. Still no success. Any advices?

    Reply
  • What’s your experience with php’s memory consumption with this method? The standard uploading requires quite some memory to process file uploads. Is this method also causing the memory consumption to be lowered in the php processes itself?

    Reply
    • Memory usage is almost nothing. All you’re passing to your FastCGI process is the meta data for the file and the location of the file on disk. There’s no actual file data being processed by your backend.

      Though, it’s worth noting that since this blog post went live the module has gotten an issue. It’s not compatible with nginx version 1.3.9 and above without a patch. See: https://github.com/vkholodkov/nginx-upload-module/issues/41

      Reply
      • Thanks for the clarification. I was already hoping the php interpreter would only deal with the metadata, making uploads far less memory consuming than the standard approach.

        I already knew the limitations for nginx, but our servers run on Ubuntu 12.04. That version ships with nginx 1.2.7 and the upload module is available via the package manager 🙂 Hopefully the issue will be resolved when we are the stage we need to upgrade. Currently, it’s for us not an issue to use 1.2.7.

        Reply
  • I am struggling to get your example to work. I am using nginx in front of a nodejs/express server (no body parsing middleware). Basically I don’t see how the nginx buffered filename is passed on to the upload_pass. I’m inspecting the headers and data and this is what i see.

    Straight upload with curl.
    curl -i -F [email protected] http://localhost:9096/upload/

    upload post headers { ‘user-agent’: ‘curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8x zlib/1.2.5’,
    host: ‘localhost:9096’,
    accept: ‘*/*’,
    ‘content-length’: ‘3347485’,
    expect: ‘100-continue’,
    ‘content-type’: ‘multipart/form-data; boundary=—————————-7e2c15173779’ }

    Then after that comes

    Content-Disposition: form-data; name=”filedata”; filename=”Jimi.mp3″
    Content-Type: application/octet-stream

    and the buffers of data.

    The same upload via nginx with the upload module and your config I get these results:

    uploadpass post headers { ‘x-real-ip’: ‘127.0.0.1’,
    ‘x-forwarded-for’: ‘127.0.0.1’,
    ‘x-forwarded-proto’: ‘http’,
    host: ‘localhost:9080’,
    ‘x-nginx-proxy’: ‘true’,
    connection: ‘Upgrade’,
    ‘content-length’: ‘769’,
    ‘user-agent’: ‘curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8x zlib/1.2.5’,
    accept: ‘*/*’,
    ‘content-type’: ‘multipart/form-data; boundary=—————————-c1b5d9d29263’ }

    Then
    Content-Disposition: form-data; name=”filedata.name”

    And no data buffers.

    It is expected that there are not data bufferes, since only the path should be passed along. But the filename is missing from the Content-Disposition, and I can’t find any other location where this filename is.

    Am I missing something or is my module not working correctly. I think it’s weird that none of the extra headers from the upload module are visible.

    I’m using nginx version 1.5.4 with this gist for the upload_module to fix its compabibility:
    https://gist.github.com/adamchal/6457039

    Any clues?

    Reply
  • When nginx redirects to your php file after your successful upload, what is your PHP file doing? Is it saving/renaming/moving the tmp file?

    Reply
  • Hello, Martin!

    Not long after you wrote https://blog.martinfjordvald.com/2010/08/file-uploading-with-php-and-nginx/ , I stumbled upon it and found it to be the best write-up on the subject available anywhere on the internet. (Thank you for documenting the subject so well!)

    All these years later, I am still using both of these extensions:

    https://github.com/masterzen/nginx-upload-progress-module
    https://github.com/vkholodkov/nginx-upload-module/tree/2.255

    But, I fear they are growing stale. It feels as though every time a new NGINX version is released, I have to fiddle and tinker to get it to compile with these extensions.

    I’m wondering if you are still using this “recipe” for handling large file uploads in NGINX, or if you have moved on to “something better”.

    My two requirements are:

    1.) The ability to resume failed uploads
    2.) The ability to track upload progress (while modern browsers can do this on the client-side, I see value in doing it server-side, too)

    I am most grateful for any information you are willing and able to share.

    Thanks again for the excellent article and for sharing your vast knowledge on the subject with the rest of us!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.