Category: PHP

Remember register globals? Remember how you had to code as if it was off, because it might be? Remember how you had to consider the security implications of it being on, because it might be? The might be and might not be is something which has plagued a lot of early PHP features. Register globals is in no way alone in this, in the effort of making things versatile the PHP developers managed to introduce the worst of both worlds and the best of none. At least for code where you can’t guarantee 100% control over the environment your code would be running in.

We see this even today with things such as short open tags. Ever been told you shouldn’t use them? Yeah, that’s primarily because they could be turned off thus leaking your source code into the document. (To a lesser degree it’s also about XML incompatibility)

Today I want to cover a very known feature, which many people don’t often think of as being in the same group as register globals and short open tags. Namely path info. The idea of path info is brilliant enough, it is most often used as a way to have SEO “friendly” URIs in cases where one might not be able to rewrite the URL. In short, you can have a URI like so:

/index.php/user/dashboard

In standard Unix this would read as “file dashboard in directory /index.php/user/”. Of course, /index.php/user/ is not a directory and there’s no file called dashboard. Instead, PHP sees this and translates it into /index.php with /user/dashboard as the path info. In case you’re shaking your head already, this is actually in the CGI spec so it’s not really a fault of PHP, there is literally an RFC specifying this behaviour.

And for the longest time this was perfectly fine. The web server model used with Apache made this a non-issue. PHP was embedded in Apache and as such would only be called for actual files configured as such. But these days people aren’t just using Apache any more, however, they still think most things function like they do in Apache. Since I’m an Nginx person and this is primarily an Nginx and PHP blog, lets look at how path info works with Nginx.

The issue is that where Apache sees files Nginx sees URIs. Nginx is at the heart of it a reverse proxy, it does not embed scripting languages and it does not execute code. Instead, it sees a URI and either try to serve a static file or pass it onto a backend. What this means is that when using PHP we see locations like the following:

location ~ \.php$ {
	fastcgi_pass upstream;
}

This location actually does not allow for normal path info to work as the location defines the URI as having to end in .php. However, lets look at what happens when we reverse the path info request URI like so:

/uploads/avatar32.jpg/index.php

In this case PHP will see that there is no index.php file in /uploads/avatar32.jpg/ and as such will instead execute /uploads/avatar32.jpg with /index.php as the path info. We are essentially allowing PHP to execute any arbitrary file in our defined nginx root by just appending /index.php to the URI!

What makes this scary is that there’s a ton of ways to hide PHP code in file uploads. For instance if you run forum software like VB you can embed PHP code inside an EXIF tag and upload it as an avatar without VB ever batting a virtual eyelash. I trust I don’t need to tell you how bad it is to allow attackers to execute arbitrary PHP code on your server.

And the best thing is that this is not even a security vulnerability in either Nginx or PHP, Nginx is doing exactly what a reverse proxy should be doing and PHP is simply following the CGI specification. As such there won’t be a “fix” for this, it’s solely up to the developers and server admins to educate themselves and understand the tools they’re actually using.

With all that dire info out of the way, the good news is that you can secure yourself very easily. The simplest way is to tell PHP not to translate the path info by setting the php.ini variable cgi.fix_pathinfo to 0. This means that PHP will instead try to execute the /index.php file which doesn’t exist and thus return 404 and “no input file specified”

The best way, in my opinion, is to use the fastcgi_split_path_info directive in Nginx to handle the path info translation in Nginx. This means that Nginx will handle the path translation instead of PHP. Combining the two is also possible, though doesn’t provide any more security than just one of them.

So why is this like register globals and short open tags? Because it’s a php.ini setting. You can turn the behaviour on and off. Your code has to consider the security implications in case it might be on, but it cannot take advantage of it in case it’s off, you’re getting the worst of both worlds. In practice this is a dangerous feature that should be deprecated and set to off in PHP by default.

Lately I’ve been working with a friend on a daily-deal aggregator. The Groupon-like sites are popping up everywhere and the market for aggregators is still fairly unfilled. My project, Alladeals, target the Swedish daily deals market and as such it needs to support Swedish characters. In future it might have to support other languages as well so I decided that UTF8 was the way to go. Since most webpages are encoded in UTF-8 these days it has been fairly painless to actually work with UTF-8 in PHP, that is, until yesterday.

PHP does not natively support UTF-8. This is fairly important to keep in mind when dealing with UTF-8 encoded data in PHP. Usually I’m pretty good at remembering that, however yesterday I happened upon a bug which could easily have gone unnoticed for months if not for some good luck.

The bug manifested itself in the deal titles, the design is not well suited for really long titles so it was decided that it would be best to make sure that the titles did not exceed a length of 140 characters. To cut the the title the following code was used:

 

$title = substr($deal['title'], 0, 140);

Catch the error? Remember that PHP does not natively support UTF-8? This means that functions like substr doesn’t count characters like the PHP manual says:

“the string returned will contain at most length characters beginning from start."

Rather, it actually counts bytes. This works fine for single byte character encodings, but UTF-8 is multi-byte, meaning that some characters can be more than 1 byte in length. This means that if the 140th byte of a string happens to be a multi-byte character you effectively cut it off in the middle of a character, resulting in one of those lovely question marks on a black background characters.

Luckily PHP has the multi-byte extension which implements a lot of the standard functions in a multi-byte safe way. This means that fixing our bug is as easy as converting our code to the following:

 

$title = mb_substr($deal['title'], 0, 140, 'UTF-8');

To be honest this is a stupid bug, one really should keep the mb_ functions in mind, but it happens and I was lucky it showed up early before it could affect too many visitors.

This is part two in my caching series. Part one covered the concept behind the full page caching as well as potential problems to keep in mind. This part will focus on implementing the concept in actual PHP code. By the end of this you’ll have a working implementation that can cache full pages and invalidate them intelligently when an update happens.

Requirements

I’ll provide a fully functional framework with the simple application I used to get my benchmark figures. You’ll need the following software to be able to run it.

  • Nginx. I’m not sure which exact version but I generally use and recommend the latest development version.
  • PHP 5.3.0. I recommend at least 5.3.3 so you’ll have PHP-FPM for your fastcgi process management.
  • MySQL
  • Memcached

The Framework

You can download the framework here: Evil Genius Framework. I’ll be referencing code in the files instead of pasting it in this post to keep the size down, so you will probably want to download it.

Read More »

Recently a post of mine was linked on yCombinator and for some reason a lot of the comments talked about the efficiency of WordPress. While it’s technically not related to the subject of the linked post I just want to point out that the performance of WordPress is pretty horrible regardless of whether you use Apache or Nginx.

My friend Karl Blessing and I recently talked about WordPress caching plugins. He uses WP SuperCache and I use W3 Total Cache and he subsequently decided to do some WordPress caching benchmarks on the different methods. He’s done an awesome job and generated some pretty graphs for you to look at.

What I took away from the whole thing is that W3 Total Cache and WP SuperCache can offer similar performance if you’re willing to do static file caching, however, W3 Total Cache can offer a cleaner solution with caching in Memcached if you’re willing to sacrifice a bit of performance. The benefit to this, and why I use this method, is that you don’t need complicated rules in your Nginx (or Apache) configuration files.

“No input file specified” is one of the most frequently encountered issues in Nginx. People on serverfault and in the #nginx IRC channel asks for help with this so often that this post is mostly to allow me to be lazy and not have to type up the same answer every time.

This is actually an error from PHP and due to display_errors being 0 people will often just get a blank page with no output. In a typical setup PHP will then send the error to stderr or stdout and Nginx will pick up on it and log it in the Nginx error log file. Thus people spend a ton of time trying to figure out why Nginx isn’t working.

The root cause of the error is that PHP cannot find the file Nginx is telling it to look for, and there are two common cases that causes this. Either you’re not giving PHP the right path to the file or your file permissions are incorrect.

Wrong Path Sent to PHP

The most common reason at the time of writing happens because a user uses a horrible tutorial found via google instead of actually understanding Nginx. Reading my primer will equip you to actually solve this on your own but since this post is actually dedicated to the error I’ll cheat this once and allow you to be lazy by just giving you the full solution.

Nginx tells PHP about the file to execute via the SCRIPT_FILENAME fastcgi_param value. Most examples in the wiki should define this as $document_root$fastcgi_script_name. The horrible tutorials will often hardcode the path value but this is not desirable as we don’t want to duplicate information and invite future screw ups. So you’ve gone with the $document_root$fastcgi_script_name option and suddenly it’s no longer working.

This happens because Nginx has 3 levels of inheritance commonly referred to as blocks, these being http, server and location, each being a sub-block of the parent. Directives in nginx inherit downwards but never up or across, so if you define something in one location block it will never be applied in any other location block under any circumstance.

Typically users define their index and root directive in location / because a tutorial told them to. So when they then define SCRIPT_FILENAME using $document_root the root directive is not actually defined and thus the SCRIPT_FILENAME value becomes just the URI making PHP look at the root server dir.

The simple solution here is to just define the directive in your server block. (or http block even!) Generally the higher up your can define a directive the less duplicate directives you’ll need.

Incorrect File Permissions

Most people don’t really believe me when I tell them their file permissions are incorrect. They’re looking at the damn permissions and the PHP user can read the file just fine! Sadly, this shows a lack of understanding of Unix user permissions. Being able to read a file is not enough, a user must also be able to traverse to the file.

This effectively means that not only should the file have read permission, but the entire directory structure should have execute permission so that the PHP user can traverse the path. An example of this:

Say you have an index.php file in /var/www. /var/www/index.php must have read permission and both /var and /var/www must have execute permissions!

If you’ve corrected both things and still have this issue then please put a comment so I can look into it, as far as I know there should be no other reasons  for this error.

Edit: Part 2 is now available.

This is the first entry in a short series I’ll do on caching in PHP. During this series I’ll explore some of the options that exist when caching PHP code and provide a unique (I think) solution that I feel works well to gain high performance without sacrificing real-time data.

Caching in PHP is usually done on a per-object basis, people will cache a query or some CPU intensive calculations to prevent redoing these CPU intensive operations. This can get you a long way. I have an old site which uses this method and gets 105 requests per second on really old hardware.

An alternative that is used, for example in the Super Cache WordPress plug-in, is to cache the full-page data. This essentially mean that you create a page only once. This introduces the problem of stale data which people usually solve by checking whether data is still valid or by using a TTL caching mechanism and accepting stale data.

The method I propose is a spin on full-page caching. I’m a big fan of Nginx and I tend to use it to solve a lot of my problems, this case is no exception. Nginx has a built-in Memcached module, with this we can store a page in Memcached and have Nginx serve it – thus never touching PHP at all. This essentially turns this:

Read More »