Category: Technology

Yesterday Nginx Inc announced that it had taken $3 million USD in funding. No one deserves this more than Igor Sysoev and it’s hard to believe that Nginx wasn’t commercialized sooner. Well deserved or not, though, whether this funding is good for Nginx or not is up for debate.

To understand the whole aspect of the deal I’ll first cover the worst-case scenario that people might fear happening. I’ll later on cover why this case is unlikely, so please do finish reading before considering me a moron.

The FUD Aspect

Getting funded means a business person has seen potential and decided to invest money to get a return. There’s really no way to deny this, philanthropy simply does not happen in the start-up world unless you’re being funded by your rich but slightly senile aunt. Eventually this business man will want to get a return on his investment and this means the Nginx Inc will have to become profitable. How does an open source project become profitable, though?

  • Going closed source and commercializing the product.
  • Creating a closed source enterprise version to develop alongside open source version.
  • Keeping the core product open and developing commercial extensions of that product.
  • Keeping product open sourced and selling support, training and resources.

There might be a few more options that I haven’t thought of, but these are the most commonly seen ones. Based on the press release and statements made to the press we know that Nginx Inc plans to release a commercial version of Nginx for paying customers. To quote Andrew Alexeev:

“we think that it’s the most valuable approach for open source projects to be open core, in order to provide the commercial features that are really needed”

So that leaves us with an open core and most likely commercial modules for enterprise customers. Modules, perhaps such as high availability, proper load balancing or actual backend monitoring. Things normal people obviously do not need.

I’ll be the first to admit that the slippery slope argument is not a proper argument, it cannot be used as evidence of Nginx going in the wrong direction. Nevertheless, it is still a fun thought-experiment. For Nginx Inc to be profitable it’s in their interest to get as many people as possible on their paid plans, as such it is in their interest to keep the functionality in the free version limited to just enough that they can keep attracting new users.

They might promise to not want to upsell users, however, we all know how much a promise is worth when it comes to making money. If the commercial modules fail then commercial version is introduced, then the free version is scraped and eventually you’ve got a new Oracle on your hands. Business people are running Nginx Inc now and the death of Nginx as open source might be coming.

The Rational Aspect

The above is, of course, pure FUD. There’s no evidence that actually points to such a scenario happening and it is merely the worst case scenario I could think up. So what do we know? What are the actual facts about this move.

  • Nginx Inc is getting new offices in San Francisco.
  • Nginx Inc will release a commercial arm based on the open source Nginx core. Whether a full version or just modules is not known.

We can infer another fact based on this – namely that Nginx Inc will hire new people. Before Nginx Inc formed as a company back in July it was largely a one man project. If you followed the development it was Igor writing code with a few rare patches from third party. Mostly other developers were told to develop modules.

Today Nginx has 3 full-time developers working on the code instead of just Igor working after-hours, this alone is a win for everyone who uses Nginx. I think it’s safe to say that development on the Nginx core should increase even if they only dedicate a single person to working on it.

Having a resourceful company behind Nginx is also a plus as it allows enterprise customers to be confident in using Nginx to power their infrastructure. They’ll be able to get support and know that the product isn’t a fly-by-night operation. More companies using Nginx means an increased need for people familiar with Nginx and that might increase the value of people with Nginx as a skill set.

The Rational Worst-Case Scenario

Lets assume for a second that the FUD aspect holds true and Nginx becomes a close source project, or even that the open source version is crippled to where it’s just a bare bones httpd which even lighttpd outshines.

Nginx Inc actually has very little control over the entire infrastructure that is Nginx, in fact, the only two things controlled by Nginx Inc are the Nginx domain(and product) and the mailing list. For the longest time Nginx support has been handled by Igor on the mailing list and the community everywhere else. The IRC channel, which these days has 300+ people idling, is controlled by community volunteers, the Nginx wiki is controlled by the same community volunteers and the Nginx forum is controlled by Jim Ohlstein who has no connection to the Nginx company.

All this means that should the worst case scenario happen with Nginx Inc blinking and suddenly having dollar signs appear in their eyes, then the community can pull an OpenSSH and fork Nginx due to it using the BSD 2-clause license. If the community so desires the documentation and support structure can follow along.

Of course, it’s important to note that this scenario is far-fetched and that forking software is a last-measure. I don’t see it happening.

My Personal Thoughts

I’m personally not too concerned at this point. Nginx has a long history of being open source and while it’s going open-core now I still feel confident that the core will not be neglected or crippled in favour of making money. On the other hand, I don’t know how much ownership Igor had to give up, nor do I know how strong of a leader/owner he’s going to be. At this point I’m positive about the funding, extra developers means good things and until I see signs otherwise I really have no reason to panicking. Should my FUD scenario ever come true I’m also pretty confident we’ll see an Nginx fork with a lot of the support structure migrating over. This of course makes it in the best interest of Nginx Inc to continue working closely with the community which has supported Nginx for so long.

I would like to see a more open development approach, though. A road map of planned features and more details what exactly they plan to offer in their commercial version would be very welcome and would allow people to know how to react.

flattr this!

Remember register globals? Remember how you had to code as if it was off, because it might be? Remember how you had to consider the security implications of it being on, because it might be? The might be and might not be is something which has plagued a lot of early PHP features. Register globals is in no way alone in this, in the effort of making things versatile the PHP developers managed to introduce the worst of both worlds and the best of none. At least for code where you can’t guarantee 100% control over the environment your code would be running in.

We see this even today with things such as short open tags. Ever been told you shouldn’t use them? Yeah, that’s primarily because they could be turned off thus leaking your source code into the document. (To a lesser degree it’s also about XML incompatibility)

Today I want to cover a very known feature, which many people don’t often think of as being in the same group as register globals and short open tags. Namely path info. The idea of path info is brilliant enough, it is most often used as a way to have SEO “friendly” URIs in cases where one might not be able to rewrite the URL. In short, you can have a URI like so:

/index.php/user/dashboard

In standard Unix this would read as “file dashboard in directory /index.php/user/”. Of course, /index.php/user/ is not a directory and there’s no file called dashboard. Instead, PHP sees this and translates it into /index.php with /user/dashboard as the path info. In case you’re shaking your head already, this is actually in the CGI spec so it’s not really a fault of PHP, there is literally an RFC specifying this behaviour.

And for the longest time this was perfectly fine. The web server model used with Apache made this a non-issue. PHP was embedded in Apache and as such would only be called for actual files configured as such. But these days people aren’t just using Apache any more, however, they still think most things function like they do in Apache. Since I’m an Nginx person and this is primarily an Nginx and PHP blog, lets look at how path info works with Nginx.

The issue is that where Apache sees files Nginx sees URIs. Nginx is at the heart of it a reverse proxy, it does not embed scripting languages and it does not execute code. Instead, it sees a URI and either try to serve a static file or pass it onto a backend. What this means is that when using PHP we see locations like the following:

location ~ \.php$ {
	fastcgi_pass upstream;
}

This location actually does not allow for normal path info to work as the location defines the URI as having to end in .php. However, lets look at what happens when we reverse the path info request URI like so:

/uploads/avatar32.jpg/index.php

In this case PHP will see that there is no index.php file in /uploads/avatar32.jpg/ and as such will instead execute /uploads/avatar32.jpg with /index.php as the path info. We are essentially allowing PHP to execute any arbitrary file in our defined nginx root by just appending /index.php to the URI!

What makes this scary is that there’s a ton of ways to hide PHP code in file uploads. For instance if you run forum software like VB you can embed PHP code inside an EXIF tag and upload it as an avatar without VB ever batting a virtual eyelash. I trust I don’t need to tell you how bad it is to allow attackers to execute arbitrary PHP code on your server.

And the best thing is that this is not even a security vulnerability in either Nginx or PHP, Nginx is doing exactly what a reverse proxy should be doing and PHP is simply following the CGI specification. As such there won’t be a “fix” for this, it’s solely up to the developers and server admins to educate themselves and understand the tools they’re actually using.

With all that dire info out of the way, the good news is that you can secure yourself very easily. The simplest way is to tell PHP not to translate the path info by setting the php.ini variable cgi.fix_pathinfo to 0. This means that PHP will instead try to execute the /index.php file which doesn’t exist and thus return 404 and “no input file specified”

The best way, in my opinion, is to use the fastcgi_split_path_info directive in Nginx to handle the path info translation in Nginx. This means that Nginx will handle the path translation instead of PHP. Combining the two is also possible, though doesn’t provide any more security than just one of them.

So why is this like register globals and short open tags? Because it’s a php.ini setting. You can turn the behaviour on and off. Your code has to consider the security implications in case it might be on, but it cannot take advantage of it in case it’s off, you’re getting the worst of both worlds. In practice this is a dangerous feature that should be deprecated and set to off in PHP by default.

flattr this!

Lately I’ve been working with a friend on a daily-deal aggregator. The Groupon-like sites are popping up everywhere and the market for aggregators is still fairly unfilled. My project, Alladeals, target the Swedish daily deals market and as such it needs to support Swedish characters. In future it might have to support other languages as well so I decided that UTF8 was the way to go. Since most webpages are encoded in UTF-8 these days it has been fairly painless to actually work with UTF-8 in PHP, that is, until yesterday.

PHP does not natively support UTF-8. This is fairly important to keep in mind when dealing with UTF-8 encoded data in PHP. Usually I’m pretty good at remembering that, however yesterday I happened upon a bug which could easily have gone unnoticed for months if not for some good luck.

The bug manifested itself in the deal titles, the design is not well suited for really long titles so it was decided that it would be best to make sure that the titles did not exceed a length of 140 characters. To cut the the title the following code was used:

 

$title = substr($deal['title'], 0, 140);

Catch the error? Remember that PHP does not natively support UTF-8? This means that functions like substr doesn’t count characters like the PHP manual says:

“the string returned will contain at most length characters beginning from start."

Rather, it actually counts bytes. This works fine for single byte character encodings, but UTF-8 is multi-byte, meaning that some characters can be more than 1 byte in length. This means that if the 140th byte of a string happens to be a multi-byte character you effectively cut it off in the middle of a character, resulting in one of those lovely question marks on a black background characters.

Luckily PHP has the multi-byte extension which implements a lot of the standard functions in a multi-byte safe way. This means that fixing our bug is as easy as converting our code to the following:

 

$title = mb_substr($deal['title'], 0, 140, 'UTF-8');

To be honest this is a stupid bug, one really should keep the mb_ functions in mind, but it happens and I was lucky it showed up early before it could affect too many visitors.

flattr this!

I have previously talked about some of the most common Nginx questions; not surprisingly, one such question is how to optimize Nginx. This is not really overly surprising since most of new Nginx users are migrating over from Apache and thus are used to having to tweak settings and perform voodoo magic to ensure that their servers perform as best as possible.

Well I’ve got some bad news for you, you can’t really optimize Nginx very much. There’s no magic settings that will reduce your load by half or make PHP run twice as fast. Thankfully, the good news is that Nginx is already optimized out of the box. The biggest optimization happened when you decided to use Nginx and ran that apt-get install, yum install or make install. (Please note that repositories are often out of date. The wiki install page usually has a more up-to-date repository)

That said, there’s a lot of options in Nginx that affects its behaviour and not all of their defaults values are completely optimized for high traffic situations. We also need to consider the platform that Nginx runs on and optimize our OS as there are limitations in place there as well.

To summarize, while we cannot optimize the load time of individual connections we can ensure that Nginx has the ideal environment for handling high traffic situations. Of course, by high traffic I mean several hundreds of requests per second so the far majority of people don’t need to mess around with this, but if you are curious or want to be prepared then read on.

First of all we need to consider the platform to use as Nginx is available on Linux, MacOS, FreeBSD, Solaris, Windows as well as some more esoteric systems. They all implement high performance event based polling methods, sadly, Nginx only support 4 of them. I tend to favour FreeBSD out of the four but you should not see huge differences and it’s more important that you are comfortable with your OS of choice than that you get the absolutely most optimized OS.

In case you hadn’t guessed it already then the odd one out is Windows. Nginx on Windows is really not an option for anything you’re going to put into production. Windows has a different way of handling event polling and the Nginx author has chosen not to support this; as such it defaults back to using select() which isn’t overly efficient and your performance will suffer quite quickly as a result.

Read More »

flattr this!

User EngagementUsually when you browse the internet you get your information and you’re on your way. Sometimes a website isn’t properly constructed and prevents you from getting your information easily, this causes you to become annoyed. Rarely you find something that is so awesome you just have to admire it and the thoughts that have gone into it. Today I found one of those things.

There are a lot of difficult aspects of starting and running a site. You have to consider how to drive traffic, how to rank on search engines, how to convert traffic to paying customers of participating users, and today I saw an ingenious example of how to engage users and encourage participation.

The image on the right is taken from a news post on the NBC Dallas site, it shows the general reader feedback and allows readers to submit their own feedback. The unique aspect is of course how they group the feedback. It’s not a generic 1 to 10 rating, it’s not a “let your voice be heard” sentence, it’s a colourful, graphic and quantified display. I cannot stress enough how awesome this is executed. What they manage to do is immediately quantify the response to an article which encourage users to add their voice.

This is an excellent opportunity to convert a drive-by reader into a repeat reader and perhaps even a community user. When you See the “I AM:” option it’s so easy to voice your opinion in a way that makes sense and feel like you’re contributing. If you’re just rating on a 1-10 scale you’re just giving an arbitrary number with no semantic meaning. Submitting a comment is obviously a way to make your voice be heard from often one that requires effort and thus not something you’ll do often. This style gently leads users from one-time reader to engaged user.

A further benefit of the immediate quantification of the public opinion is that you can start doing stuff like they do at the top of their article. Basically they take their data and use it to make it more interesting. “Armed Agent Slips Past DFW Body Scanner” is interesting but “Locals are furious as armed agent slips past DFW body scanner” is emotionally charged and far more likely to generate a click.

News Suggestion

So the concept is awesome, the design is well done and right at eye level. But is there anything we can do to improve it? Well let’s take a look at what happens when we decide to give them our feedback.

We see that our choice is highlighted and we’re given the option of spreading our opinion to Twitter or Facebook. This is okay for getting our message out there, but studies have shown that most twitter messages aren’t actually read any people other than the author. Most of the marketing you get out there is just screamed into a void of boundless other marketing.

The other option is to encourage users to leave a comment and explain their reaction to the  news. This allows us to get an email address and give users the option to sign up for notification upon a reply. That way we create a return reader who will become familiar with the site and more likely to return in the future.

In general I feel like there’s too much of a focus on social media. This is a site which puts a lot of emphasis on being a local news site, they even have a section called Around Town. Contrast that to their lack of community stuff, there’s nothing beyond comments, I couldn’t even find an RSS feed.

I’m pretty sure they’d see a higher user retention and ultimately a higher traffic amount if they encouraged users to subscribe to a news feed and encouraged them to engage with other users  instead of asking users to promote them on social media sites.

flattr this!

This is part two in my caching series. Part one covered the concept behind the full page caching as well as potential problems to keep in mind. This part will focus on implementing the concept in actual PHP code. By the end of this you’ll have a working implementation that can cache full pages and invalidate them intelligently when an update happens.

Requirements

I’ll provide a fully functional framework with the simple application I used to get my benchmark figures. You’ll need the following software to be able to run it.

  • Nginx. I’m not sure which exact version but I generally use and recommend the latest development version.
  • PHP 5.3.0. I recommend at least 5.3.3 so you’ll have PHP-FPM for your fastcgi process management.
  • MySQL
  • Memcached

The Framework

You can download the framework here: Evil Genius Framework. I’ll be referencing code in the files instead of pasting it in this post to keep the size down, so you will probably want to download it.

Read More »

flattr this!