<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
><channel><title>Martin Fjordvald</title> <atom:link href="http://blog.martinfjordvald.com/feed/" rel="self" type="application/rss+xml" /><link>http://blog.martinfjordvald.com</link> <description>PHP, Nginx and Site Management</description> <lastBuildDate>Thu, 02 Feb 2012 08:38:54 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.2</generator> <item><title>Nginx Taking Funding: A Deal with the Devil?</title><link>http://blog.martinfjordvald.com/2011/10/nginx-taking-funding-a-deal-with-the-devil/</link> <comments>http://blog.martinfjordvald.com/2011/10/nginx-taking-funding-a-deal-with-the-devil/#comments</comments> <pubDate>Wed, 12 Oct 2011 17:57:38 +0000</pubDate> <dc:creator>mfjordvald</dc:creator> <category><![CDATA[Nginx]]></category> <category><![CDATA[Technology]]></category><guid
isPermaLink="false">http://blog.martinfjordvald.com/?p=284</guid> <description><![CDATA[Yesterday Nginx Inc announced that it had taken $3 million USD in funding. No one deserves this more than Igor Sysoev and it&#8217;s hard to believe that Nginx wasn&#8217;t commercialized sooner. Well deserved or not, though, whether this funding is good for Nginx or not is up for debate. To understand the whole aspect of [...]]]></description> <content:encoded><![CDATA[<div
class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a
href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F10%2Fnginx-taking-funding-a-deal-with-the-devil%2F"><br
/> <img
src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F10%2Fnginx-taking-funding-a-deal-with-the-devil%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br
/> </a></div><p>Yesterday Nginx Inc announced that it had <a
title="Nginx takes £3M USD in funding" href="http://nginx.com/nginx-venture-funding.html" target="_blank">taken $3 million USD in funding.</a> No one deserves this more than Igor Sysoev and it&#8217;s hard to believe that Nginx wasn&#8217;t commercialized sooner. Well deserved or not, though, whether this funding is good for Nginx or not is up for debate.</p><p>To understand the whole aspect of the deal I&#8217;ll first cover the worst-case scenario that people might fear happening. I&#8217;ll later on cover why this case is unlikely, so please do finish reading before considering me a moron.</p><p><strong>The FUD Aspect</strong></p><p>Getting funded means a business person has seen potential and decided to invest money to get a return. There&#8217;s really no way to deny this, philanthropy simply does not happen in the start-up world<strong></strong> unless you&#8217;re being funded by your rich but slightly senile aunt. Eventually this business man will want to get a return on his investment and this means the Nginx Inc will have to become profitable. How does an open source project become profitable, though?</p><ul><li>Going closed source and commercializing the product.</li><li>Creating a closed source enterprise version to develop alongside open source version.</li><li>Keeping the core product open and developing commercial extensions of that product.</li><li>Keeping product open sourced and selling support, training and resources.</li></ul><p>There might be a few more options that I haven&#8217;t thought of, but these are the most commonly seen ones. Based on the press release and <a
title="Server Watch: Nginx goes open core" href="http://www.serverwatch.com/server-news/nginx-goes-open-core.html" target="_blank">statements made to the press</a> we know that Nginx Inc plans to release a commercial version of Nginx for paying customers. To quote Andrew Alexeev:</p><blockquote><p><em>&#8220;we think that it&#8217;s the most valuable approach for open source projects to be open core, in order to provide the commercial features that are really needed&#8221;</em></p></blockquote><p>So that leaves us with an open core and most likely commercial modules for enterprise customers. Modules, perhaps such as high availability, proper load balancing or actual backend monitoring. Things normal people obviously do not need.</p><p>I&#8217;ll be the first to admit that the slippery slope argument is not a proper argument, it cannot be used as evidence of Nginx going in the wrong direction. Nevertheless, it is still a fun thought-experiment. For Nginx Inc to be profitable it&#8217;s in their interest to get as many people as possible on their paid plans, as such it is in their interest to keep the functionality in the free version limited to just enough that they can keep attracting new users.</p><p>They might promise to not want to upsell users, however, we all know how much a promise is worth when it comes to making money. If the commercial modules fail then commercial version is introduced, then the free version is scraped and eventually you&#8217;ve got a new Oracle on your hands. Business people are running Nginx Inc now and the death of Nginx as open source might be coming.</p><p><strong>The Rational Aspect</strong></p><p>The above is, of course, pure FUD. There&#8217;s no evidence that actually points to such a scenario happening and it is merely the worst case scenario I could think up. So what do we know? What are the <strong>actual facts</strong> about this move.</p><ul><li>Nginx Inc is getting new offices in San Francisco.</li><li>Nginx Inc <strong>will</strong> release a commercial arm based on the open source Nginx core. Whether a full version or just modules is not known.</li></ul><p>We can infer another fact based on this &#8211; namely that Nginx Inc will hire new people. Before Nginx Inc formed as a company back in July it was largely a one man project. If you followed the development it was Igor writing code with a few rare patches from third party. Mostly other developers were told to develop modules.</p><p>Today Nginx has 3 full-time developers working on the code instead of just Igor working after-hours, this alone is a win for everyone who uses Nginx. I think it&#8217;s safe to say that development on the Nginx core should increase even if they only dedicate a single person to working on it.</p><p>Having a resourceful company behind Nginx is also a plus as it allows enterprise customers to be confident in using Nginx to power their infrastructure. They&#8217;ll be able to get support and know that the product isn&#8217;t a fly-by-night operation. More companies using Nginx means an increased need for people familiar with Nginx and that might increase the value of people with Nginx as a skill set.</p><p><strong>The Rational Worst-Case Scenario</strong></p><p>Lets assume for a second that the FUD aspect holds true and Nginx becomes a close source project, or even that the open source version is crippled to where it&#8217;s just a bare bones httpd which even lighttpd outshines.</p><p>Nginx Inc actually has very little control over the entire infrastructure that is Nginx, in fact, the only two things controlled by Nginx Inc are the Nginx domain(and product) and the mailing list. For the longest time Nginx support has been handled by Igor on the mailing list and the community everywhere else. The IRC channel, which these days has 300+ people idling, is controlled by community volunteers, the <a
title="The Nginx Wiki" href="http://wiki.nginx.org/Main" target="_blank">Nginx wiki</a> is controlled by the same community volunteers and the Nginx forum is controlled by Jim Ohlstein who has no connection to the Nginx company.</p><p>All this means that should the worst case scenario happen with Nginx Inc blinking and suddenly having dollar signs appear in their eyes, then the community can pull an OpenSSH and fork Nginx due to it using the BSD 2-clause license. If the community so desires the documentation and support structure can follow along.</p><p>Of course, it&#8217;s important to note that this scenario is far-fetched and that forking software is a last-measure. I don&#8217;t see it happening.</p><p><strong>My Personal Thoughts</strong></p><p>I&#8217;m personally not too concerned at this point. Nginx has a long history of being open source and while it&#8217;s going open-core now I still feel confident that the core will not be neglected or crippled in favour of making money. On the other hand, I don&#8217;t know how much ownership Igor had to give up, nor do I know how strong of a leader/owner he&#8217;s going to be. At this point I&#8217;m positive about the funding, extra developers means good things and until I see signs otherwise I really have no reason to panicking. Should my FUD scenario ever come true I&#8217;m also pretty confident we&#8217;ll see an Nginx fork with a lot of the support structure migrating over. This of course makes it in the best interest of Nginx Inc to continue working closely with the community which has supported Nginx for so long.</p><p>I <strong>would</strong> like to see a more open development approach, though. A road map of planned features and more details what exactly they plan to offer in their commercial version would be very welcome and would allow people to know how to react.</p> <img
src="http://blog.martinfjordvald.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=284" width="1" height="1" style="display: none;" /><p><a
href="http://blog.martinfjordvald.com/?flattrss_redirect&amp;id=284&amp;md5=a7cd5c7fd19bcc16e248d84e7adc1ad5" title="Flattr" target="_blank"><img
src="http://blog.martinfjordvald.com/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded> <wfw:commentRss>http://blog.martinfjordvald.com/2011/10/nginx-taking-funding-a-deal-with-the-devil/feed/</wfw:commentRss> <slash:comments>15</slash:comments> <atom:link rel="payment" href="https://flattr.com/submit/auto?user_id=mfjordvald&amp;popout=1&amp;url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F10%2Fnginx-taking-funding-a-deal-with-the-devil%2F&amp;language=en_GB&amp;category=text&amp;title=Nginx+Taking+Funding%3A+A+Deal+with+the+Devil%3F&amp;description=Yesterday+Nginx+Inc+announced+that+it+had+taken+%243+million+USD+in+funding.+No+one+deserves+this+more+than+Igor+Sysoev+and+it%26%238217%3Bs+hard+to+believe+that+Nginx+wasn%26%238217%3Bt+commercialized...&amp;tags=blog" type="text/html" /> </item> <item><title>Why Path Info is the Worst PHP Feature Since Register Globals</title><link>http://blog.martinfjordvald.com/2011/06/why-path-info-is-the-worst-php-feature-since-register-globals/</link> <comments>http://blog.martinfjordvald.com/2011/06/why-path-info-is-the-worst-php-feature-since-register-globals/#comments</comments> <pubDate>Tue, 14 Jun 2011 21:43:33 +0000</pubDate> <dc:creator>mfjordvald</dc:creator> <category><![CDATA[PHP]]></category> <category><![CDATA[Technology]]></category><guid
isPermaLink="false">http://blog.martinfjordvald.com/?p=249</guid> <description><![CDATA[Remember register globals? Remember how you had to code as if it was off, because it might be? Remember how you had to consider the security implications of it being on, because it might be? The might be and might not be is something which has plagued a lot of early PHP features. Register globals [...]]]></description> <content:encoded><![CDATA[<div
class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a
href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F06%2Fwhy-path-info-is-the-worst-php-feature-since-register-globals%2F"><br
/> <img
src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F06%2Fwhy-path-info-is-the-worst-php-feature-since-register-globals%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br
/> </a></div><p>Remember register globals? Remember how you had to code as if it was off, because it might be? Remember how you had to consider the security implications of it being on, because it might be? The might be and might not be is something which has plagued a lot of early PHP features. Register globals is in no way alone in this, in the effort of making things versatile the PHP developers managed to introduce the worst of both worlds and the best of none. At least for code where you can&#8217;t guarantee 100% control over the environment your code would be running in.</p><p>We see this even today with things such as short open tags. Ever been told you shouldn&#8217;t use them? Yeah, that&#8217;s primarily because they could be turned off thus leaking your source code into the document. (To a lesser degree it&#8217;s also about XML incompatibility)</p><p>Today I want to cover a very known feature, which many people don&#8217;t often think of as being in the same group as register globals and short open tags. Namely path info. The idea of path info is brilliant enough, it is most often used as a way to have SEO &#8220;friendly&#8221; URIs in cases where one might not be able to rewrite the URL. In short, you can have a URI like so:</p><pre>/index.php/user/dashboard</pre><p>In standard Unix this would read as &#8220;file dashboard in directory /index.php/user/&#8221;. Of course, /index.php/user/ is not a directory and there&#8217;s no file called dashboard. Instead, PHP sees this and translates it into /index.php with /user/dashboard as the path info. In case you&#8217;re shaking your head already, this is actually in the CGI spec so it&#8217;s not really a fault of PHP, there is literally an RFC specifying this behaviour.</p><p>And for the longest time this was perfectly fine. The web server model used with Apache made this a non-issue. PHP was embedded in Apache and as such would only be called for actual files configured as such. But these days people aren&#8217;t just using Apache any more, however, they still think most things function like they do in Apache. Since I&#8217;m an Nginx person and this is primarily an Nginx and PHP blog, lets look at how path info works with Nginx.</p><p>The issue is that where Apache sees files Nginx sees URIs. Nginx is at the heart of it a reverse proxy, it does not embed scripting languages and it does not execute code. Instead, it sees a URI and either try to serve a static file or pass it onto a backend. What this means is that when using PHP we see locations like the following:</p><pre>location ~ \.php$ {
	fastcgi_pass upstream;
}</pre><p>This location actually does not allow for normal path info to work as the location defines the URI as having to end in .php. However, lets look at what happens when we reverse the path info request URI like so:</p><pre>/uploads/avatar32.jpg/index.php</pre><p>In this case PHP will see that there is no index.php file in /uploads/avatar32.jpg/ and as such will instead execute /uploads/avatar32.jpg with /index.php as the path info. We are essentially allowing PHP to execute any arbitrary file in our defined nginx root by just appending /index.php to the URI!</p><p>What makes this scary is that there&#8217;s a ton of ways to hide PHP code in file uploads. For instance if you run forum software like VB you can embed PHP code inside an EXIF tag and upload it as an avatar without VB ever batting a virtual eyelash. I trust I don&#8217;t need to tell you how bad it is to allow attackers to execute arbitrary PHP code on your server.</p><p>And the best thing is that this is not even a security vulnerability in either Nginx or PHP, Nginx is doing exactly what a reverse proxy should be doing and PHP is simply following the CGI specification. As such there won&#8217;t be a &#8220;fix&#8221; for this, it&#8217;s solely up to the developers and server admins to educate themselves and understand the tools they&#8217;re actually using.</p><p>With all that dire info out of the way, the good news is that you can secure yourself very easily. The simplest way is to tell PHP not to translate the path info by setting the php.ini variable cgi.fix_pathinfo to 0. This means that PHP will instead try to execute the /index.php file which doesn&#8217;t exist and thus return 404 and &#8220;<a
title="No Input File Specified" href="http://blog.martinfjordvald.com/2011/01/no-input-file-specified-with-php-and-nginx/">no input file specified&#8221;</a></p><p>The best way, in my opinion, is to use <a
title="fastcgi split path info" href="http://wiki.nginx.org/HttpFcgiModule#fastcgi_split_path_info">the fastcgi_split_path_info directive</a> in Nginx to handle the path info translation in Nginx. This means that Nginx will handle the path translation instead of PHP. Combining the two is also possible, though doesn&#8217;t provide any more security than just one of them.</p><p>So why is this like register globals and short open tags? Because it&#8217;s a php.ini setting. You can turn the behaviour on and off. Your code has to consider the security implications in case it might be on, but it cannot take advantage of it in case it&#8217;s off, you&#8217;re getting the worst of both worlds. In practice this is a dangerous feature that should be deprecated and set to off in PHP by default.</p> <img
src="http://blog.martinfjordvald.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=249" width="1" height="1" style="display: none;" /><p><a
href="http://blog.martinfjordvald.com/?flattrss_redirect&amp;id=249&amp;md5=a9fd2a1ca8ec4931b4fd47941bd87770" title="Flattr" target="_blank"><img
src="http://blog.martinfjordvald.com/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded> <wfw:commentRss>http://blog.martinfjordvald.com/2011/06/why-path-info-is-the-worst-php-feature-since-register-globals/feed/</wfw:commentRss> <slash:comments>6</slash:comments> <atom:link rel="payment" href="https://flattr.com/submit/auto?user_id=mfjordvald&amp;popout=1&amp;url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F06%2Fwhy-path-info-is-the-worst-php-feature-since-register-globals%2F&amp;language=en_GB&amp;category=text&amp;title=Why+Path+Info+is+the+Worst+PHP+Feature+Since+Register+Globals&amp;description=Remember+register+globals%3F+Remember+how+you+had+to+code+as+if+it+was+off%2C+because+it+might+be%3F+Remember+how+you+had+to+consider+the+security+implications+of+it+being...&amp;tags=blog" type="text/html" /> </item> <item><title>The Fun that is UTF-8 Support in PHP</title><link>http://blog.martinfjordvald.com/2011/05/the-fun-that-is-utf-8-support-in-php/</link> <comments>http://blog.martinfjordvald.com/2011/05/the-fun-that-is-utf-8-support-in-php/#comments</comments> <pubDate>Fri, 20 May 2011 15:00:14 +0000</pubDate> <dc:creator>mfjordvald</dc:creator> <category><![CDATA[PHP]]></category> <category><![CDATA[Technology]]></category><guid
isPermaLink="false">http://blog.martinfjordvald.com/?p=235</guid> <description><![CDATA[Lately I've been working with a friend on a daily-deal aggregator. The Groupon-like sites are popping up everywhere and the market for aggregators is still fairly unfilled. My project, Alladeals, target the Swedish daily deals market and as such it needs to support Swedish characters. In future it might have to support other languages as well so I decided that UTF8 was the way to go. Since most webpages are encoded in UTF-8 these days it has been fairly painless to actually work with UTF-8 in PHP, that is, until yesterday.]]></description> <content:encoded><![CDATA[<div
class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a
href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F05%2Fthe-fun-that-is-utf-8-support-in-php%2F"><br
/> <img
src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F05%2Fthe-fun-that-is-utf-8-support-in-php%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br
/> </a></div><p>Lately I&#8217;ve been working with a friend on a daily-deal aggregator. The Groupon-like sites are popping up everywhere and the market for aggregators is still fairly unfilled. My project, Alladeals, target the<a
title="AllaDeals - Daily Deals Aggregator" href="http://www.alladeals.com/" target="_blank"> Swedish daily deals market</a> and as such it needs to support Swedish characters. In future it might have to support other languages as well so I decided that UTF8 was the way to go. Since most webpages are encoded in UTF-8 these days it has been fairly painless to actually work with UTF-8 in PHP, that is, until yesterday.</p><p>PHP does not natively support UTF-8. This is fairly important to keep in mind when dealing with UTF-8 encoded data in PHP. Usually I&#8217;m pretty good at remembering that, however yesterday I happened upon a bug which could easily have gone unnoticed for months if not for some good luck.</p><p>The bug manifested itself in the deal titles, the design is not well suited for really long titles so it was decided that it would be best to make sure that the titles did not exceed a length of 140 characters. To cut the the title the following code was used:</p><p>&nbsp;</p><div
class="wp_syntax"><div
class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$title</span> <span style="color: #339933;">=</span> <span style="color: #990000;">substr</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$deal</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'title'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">140</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div><p>Catch the error? Remember that PHP does not natively support UTF-8? This means that functions like substr doesn&#8217;t count characters like the <a
title="PHP Manual for substr" href="http://php.net/substr" target="_blank">PHP manual says</a>:</p><blockquote><p>&#8220;the string        returned will contain at most <em><tt>length</tt></em> characters        beginning from <em><tt>start."</tt></em></p></blockquote><p>Rather, it actually counts bytes. This works fine for single byte character encodings, but UTF-8 is multi-byte, meaning that some characters can be more than 1 byte in length. This means that if the 140th byte of a string happens to be a multi-byte character you effectively cut it off in the middle of a character, resulting in one of those lovely question marks on a black background characters.</p><p>Luckily PHP has the multi-byte extension which implements a lot of the standard functions in a multi-byte safe way. This means that fixing our bug is as easy as converting our code to the following:</p><p>&nbsp;</p><div
class="wp_syntax"><div
class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000088;">$title</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mb_substr</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$deal</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'title'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">140</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'UTF-8'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div><p>To be honest this is a stupid bug, one really should keep the mb_ functions in mind, but it happens and I was lucky it showed up early before it could affect too many visitors.</p> <img
src="http://blog.martinfjordvald.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=235" width="1" height="1" style="display: none;" /><p><a
href="http://blog.martinfjordvald.com/?flattrss_redirect&amp;id=235&amp;md5=714c695c84282783255370cb438a1f6b" title="Flattr" target="_blank"><img
src="http://blog.martinfjordvald.com/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded> <wfw:commentRss>http://blog.martinfjordvald.com/2011/05/the-fun-that-is-utf-8-support-in-php/feed/</wfw:commentRss> <slash:comments>8</slash:comments> <atom:link rel="payment" href="https://flattr.com/submit/auto?user_id=mfjordvald&amp;popout=1&amp;url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F05%2Fthe-fun-that-is-utf-8-support-in-php%2F&amp;language=en_GB&amp;category=text&amp;title=The+Fun+that+is+UTF-8+Support+in+PHP&amp;description=Lately+I%26%238217%3Bve+been+working+with+a+friend+on+a+daily-deal+aggregator.+The+Groupon-like+sites+are+popping+up+everywhere+and+the+market+for+aggregators+is+still+fairly+unfilled.+My+project%2C+Alladeals%2C...&amp;tags=blog" type="text/html" /> </item> <item><title>Optimizing Nginx for High Traffic Loads</title><link>http://blog.martinfjordvald.com/2011/04/optimizing-nginx-for-high-traffic-loads/</link> <comments>http://blog.martinfjordvald.com/2011/04/optimizing-nginx-for-high-traffic-loads/#comments</comments> <pubDate>Wed, 27 Apr 2011 22:45:13 +0000</pubDate> <dc:creator>mfjordvald</dc:creator> <category><![CDATA[Nginx]]></category> <category><![CDATA[Performance]]></category> <category><![CDATA[Technology]]></category><guid
isPermaLink="false">http://blog.martinfjordvald.com/?p=211</guid> <description><![CDATA[I have previously talked about some of the most common Nginx questions, not surprisingly, one such question is how to optimize Nginx. This is not really overly surprising since most of new Nginx users are migrating over from Apache and thus are used to having to tweak settings and perform voodoo magic to ensure that their servers perform as best as possible.]]></description> <content:encoded><![CDATA[<div
class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a
href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F04%2Foptimizing-nginx-for-high-traffic-loads%2F"><br
/> <img
src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F04%2Foptimizing-nginx-for-high-traffic-loads%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br
/> </a></div><p>I have previously talked about some of the most common Nginx questions; not surprisingly, one such question is how to optimize Nginx. This is not really overly surprising since most of new Nginx users are migrating over from Apache and thus are used to having to tweak settings and perform voodoo magic to ensure that their servers perform as best as possible.</p><p>Well I&#8217;ve got some bad news for you, you can&#8217;t <em>really</em> optimize Nginx very much. There&#8217;s no magic settings that will reduce your load by half or make PHP run twice as fast. Thankfully, the good news is that Nginx is already optimized out of the box. The biggest optimization happened when you decided to use Nginx and ran that apt-get install, yum install or make install. (Please note that repositories are often out of date. The <a
title="Up-to-date Nginx repositories" href="http://wiki.nginx.org/Install" target="_blank">wiki install page</a> usually has a more up-to-date repository)</p><p>That said, there&#8217;s a lot of options in Nginx that affects its behaviour and not all of their defaults values are completely optimized for high traffic situations. We also need to consider the platform that Nginx runs on and optimize our OS as there are limitations in place there as well.</p><p>To summarize, while we cannot optimize the load time of individual connections we can ensure that Nginx has the ideal environment for handling high traffic situations. Of course, by high traffic I mean several hundreds of requests per second so the far majority of people don&#8217;t need to mess around with this, but if you are curious or want to be prepared then read on.</p><p>First of all we need to consider the platform to use as Nginx is available on Linux, MacOS, FreeBSD, Solaris, Windows as well as some more esoteric systems. They all implement high performance event based polling methods, sadly, Nginx only support 4 of them. I tend to favour FreeBSD out of the four but you should not see huge differences and it&#8217;s more important that you are comfortable with your OS of choice than that you get the absolutely most optimized OS.</p><p>In case you hadn&#8217;t guessed it already then the odd one out is Windows. Nginx on Windows is really not an option for anything you&#8217;re going to put into production. Windows has a different way of handling event polling and the Nginx author has chosen not to support this; as such it defaults back to using select() which isn&#8217;t overly efficient and your performance will suffer quite quickly as a result.</p><p><span
id="more-211"></span>The second biggest limitation that most people run into is also related to your OS. Open up a shell, su to the user Nginx runs as and then run the command `ulimit -a`. Those values are all limitations Nginx cannot exceed. In many default systems the open files value is rather limited, on a system I just checked it was set to 1024. If Nginx runs into a situation where it hits this limit it will log the error (24: Too many open files) and return an error to the client. Naturally Nginx can handle a lot more than 1024 files and chances are your OS can as well. You can safely increase this value.</p><p>To do this you can either set the limit with ulimit or you can use <a
href="http://wiki.nginx.org/CoreModule#worker_rlimit_nofile" target="_blank">worker_rlimit_nofile</a> to define your desired open file descriptor limit.</p><p><strong>Nginx Limitations</strong></p><p><strong> </strong></p><p><strong> </strong>With the OS taken care of it&#8217;s time to dive into Nginx itself and have a  look at some of the directives and methods we can use to tweak things.<strong> </strong></p><p><strong> </strong></p><p><strong>Worker Processes<br
/> </strong>The <a
href="http://wiki.nginx.org/CoreModule#worker_processes" target="_blank">worker process</a> is the backbone of Nginx, once the master has bound to to the required IP/ports it will spawn workers as the specified user and they&#8217;ll then handle all the work. Workers are not multi-threaded so they do not spread the per-connection across CPU cores. Thus it makes sense for us to run multiple workers, usually 1 worker per CPU core. Though, anything above 4 workers is serious overkill as Nginx will hit other bottlenecks before the CPU becomes an issue.</p><p><strong>Worker Connections<br
/> </strong><a
href="http://wiki.nginx.org/EventsModule#worker_connections" target="_blank">Worker connections</a> are sort of a weird concept. I&#8217;m not entirely sure what the purpose is of this directive but it effectively limits how many connections each worker can maintain at a time. If I were forced to guess I&#8217;d say it was insurance against keep-alive being incorrectly configured and running connections up into areas where you might start running out of ports to use.<strong> </strong></p><p><strong> </strong></p><p><strong> </strong>In the default configuration file it is set to 1024, if we consider that browsers normally open up 2 connections for pipe lining site assets then that leaves us with a maximum of 512 users handled simultaneously. This sounds like quite a few clients, but if we then consider that the default keep-alive time-out is 65 (in the default configuration file provided with source, the default if no value is specified is 75 according to the wiki) then that means we can actually only handle about 8 connections per second. Obviously this is a lot more than most people need, especially considering we&#8217;ll usually be running 2-4 workers. But for high traffic sites with keep-alive enabled this is worth keeping in mind.</p><p>Additionally we also have to consider reverse proxying which will open up an additional connection to your backend, however, since Nginx does not support persistent connections to backends this is not too much of an issue unless you have long running backend processes.</p><p>All things considered about worker connections it should be fairly clear that if you grow in traffic you&#8217;ll want to eventually increase the amount of connections each worker can do. 2048 should do for most people but honestly, if you have this kind of traffic you should not have any doubt how high you need this number to be.</p><p><strong>CPU Affinity</strong><br
/> Setting CPU affinity basically means you tell each worker which CPU core to use, they&#8217;ll then use only that CPU core. I&#8217;m not going to cover this much too much except to say that you should be <strong>really</strong> careful doing this. Chances are your OS CPU scheduler is far, far better at handling load balancing than you are. If you think you have issues with CPU load balancing then optimize this at the scheduler level, potentially find an alternative scheduler but unless you know what you&#8217;re doing then don&#8217;t touch this.</p><p><strong>Keep Alive<br
/> </strong><a
href="http://wiki.nginx.org/HttpCoreModule#keepalive_timeout" target="_blank">Keep alive</a> is a HTTP feature which allows user agents to keep the connection to your server open for a number of requests or until the specified timeout is reached. This won&#8217;t actually change the performance of our Nginx server very much as it handles idle connections very well. The author of Nginx claims that 10,000 idle connections will use only 2.5 MB of memory, and from what I&#8217;ve seen this seems to be correct.</p><p>The reason I cover this in a performance guide is pretty simple. Keep alive have a huge effect on the perceived load time for the end user. This is the most important measurement you can ever optimize, if your website seem to load fast to users then they&#8217;re happy. Studies done by Amazon and other large online retailers shows that there is a direct correlation between perceived load time and sales completed.</p><p>It should be somewhat obvious why keep alive connections have such a huge impact, namely you avoid the whole HTTP connection creation aspect, which is not insignificant. You probably don&#8217;t need a keep alive timeout value of 65, but 10-20 is definitely recommended, and as previously stated, Nginx can easily handle this.</p><p><strong>tcp_nodelay and tcp_nopush<br
/> </strong>These two directives are probably some of the most difficult to understand as they affect Nginx on a very low networking level. The very short and superficial explanation is that these directives determine how the OS handles the network buffers and when to flush them to the end user. I can only recommend that if you do not know about these already then you shouldn&#8217;t mess with them. They won&#8217;t significantly improve or change anything so best to just leave them at their default values.<strong> </strong></p><p><strong> </strong></p><p><strong>Hardware Limitations</strong></p><p>Since we&#8217;ve now dealt with all the possible limitations imposed by Nginx it time to figure out how to push the most out of our server. To do this we need to look to the hardware level as this is the most likely place to find our bottleneck.</p><p>With servers we have primarily 3 potential bottleneck areas. The CPU, the memory and the IO layers. Nginx is very efficient with its CPU usage so I can tell you straight up that this is not going to be your bottleneck, ever. Likewise, it&#8217;s also very efficient with its memory usage so this is very unlikely to be our bottleneck as well. This leaves IO as the primary culprit of our server bottleneck.</p><p>If you&#8217;re used to dealing with servers then you&#8217;ve probably experienced this before. Hard drives are <em>really, really slow</em>. Reading from the hard drive is probably one of the most expensive operations you can do in a server and therefore the natural conclusion is that to avoid an IO bottleneck we need to reduce the amount of hard drive reading and writing Nginx does.</p><p>To do this we can modify the behaviour of Nginx to minimize disk writes as well as make sure the memory constraints imposed on nginx allows it to avoid disk access.</p><p><strong>Access Logs<br
/> </strong>By default Nginx will write every request to a file on disk for logging purposes, you can use this for statistics, security checks and such, however it comes at the cost of IO usage. If you don&#8217;t use access logs for anything you can simply just turn it off and avoid the disk writes. However, if you do require access logs then consider saving the logs to a memory instead. This will be much faster than writing to the disk and will reduce IO usage significantly.</p><p>If you only use access logs for statistics then consider whether you can use something like Google Analytics instead, or whether you can log only a subset of requests instead of all of them.</p><p><strong>Error Logs</strong><br
/> I sort of debated internally whether I should even cover this directive as you really don&#8217;t want to disable error logging, especially considering how low volume the error log actually is. That said, there is one gotcha with this directive, namely the error log level parameter you can supply, if set too low this will log 404 errors and possibly even debug info. Setting this to warn level in production environments should be more than sufficient and keep the IO low.</p><p><strong>Open File Cache</strong></p><p>A part of reading from the file system consists of opening and closing files, considering that this is a blocking operation it is a not insignificant part. Thus, it makes good sense for us to cache the open file descriptors and this is where the <a
title="Nginx Open File Cache" href="http://wiki.nginx.org/HttpCoreModule#open_file_cache" target="_blank">open file cache</a> comes in. The linked wiki has a pretty decent explanation of how to enable and configure it so I suggest you go read that.</p><p><strong>Buffers</strong><br
/> One of the most important things you need to tweak is the buffer sizes you allow Nginx to use. If the buffer sizes are set too low Nginx will have to store the responses from upstreams in a temporary file which causes both write and read IO, the more traffic you get the more of a problem this becomes.</p><p><a
title="Nginx Client Body buffer Size" href="http://wiki.nginx.org/HttpCoreModule#client_body_buffer_size" target="_blank"><strong>client_body_buffer_size</strong></a> is the directive which handles the client <strong>request</strong> buffer size, meaning the incoming request body. This is used to handle POST data, meaning form submissions, file uploads etc. You&#8217;ll want to make sure that the buffer is large enough if you handle a lot of large POST data submissions.</p><p><a
href="http://wiki.nginx.org/HttpFcgiModule#fastcgi_buffers" target="_blank"><strong>fastcgi_buffers</strong></a> and <a
href="http://wiki.nginx.org/HttpProxyModule#proxy_buffers" target="_blank"><strong>proxy_buffers</strong></a> are the directives which deal with the response from your upstream, meaning PHP, Apache or whatever you use. The concept is exactly the same as above, if the buffers aren&#8217;t large enough the data will be saved to disk before being served to the user. Notice that there is an upper limit for what Nginx will buffer, even on disk, before it transfer it synchronously to the client. This limit is governed by <strong>fastcgi_max_temp_file_size</strong> as well as <strong>proxy_max_temp_file_size</strong>. In addition you can also turn it off entirely for proxy connections with <strong>proxy_buffering</strong> set to off. (Usually not a good idea!)</p><p><strong>Removing Disk IO Entirely</strong></p><p>The best way to remove disk IO is of course to not use the disks at all, if you have only a small amount of data chances are you can just fit it all in memory and thus remove the limitation of disk IO entirely. By default your OS will also cache the frequently asked disk sectors so the more memory you have the less IO you will do. What this means is that you can buy your way out of this limitation by just adding more memory. The more data you have the more memory you&#8217;ll need, of course.</p><p><strong>Network IO</strong></p><p>For the sake of fun we will assume that you&#8217;ve managed to get enough memory that you can fit your entire data set in there. This means you can theoretically do around 3-6gbps of read IO. Chances are, though, that you do not have that fast of a network pipe. Sadly, it&#8217;s limited how much we can actually optimize the amount of network IO as we need to transfer the data somehow. The only real way is to either minimize the starting data amount or compress it.</p><p>Thankfully Nginx has a <a
title="Nginx Gzip Module" href="http://wiki.nginx.org/HttpGzipModule" target="_blank">gzip module</a> which allow us to compress the data before it&#8217;s sent to the client, this can drastically reduce the size of the data. Generally the gzip_comp_level where you start not getting any further improvements is around 4-5. There&#8217;s no point in increasing it further as you will just waste CPU cycles.</p><p>You can also minimize the data by using various javascript and css minimizers. This is not really Nginx related so I will trust that you can find enough information on this using Google.</p><p><strong>Phew</strong></p><p>And with that we&#8217;ve reached the end of this subject. If you still require additional optimization then it&#8217;s time to consider using extra servers to scale your service instead of wasting time micro optimizing Nginx further, but that&#8217;s a topic for another time as I&#8217;ve been running on for quite a while now. In case you&#8217;re curious the end word count was just above 2400, so best to take a small break before you go explore my blog further!<strong><br
/> </strong></p> <img
src="http://blog.martinfjordvald.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=211" width="1" height="1" style="display: none;" /><p><a
href="http://blog.martinfjordvald.com/?flattrss_redirect&amp;id=211&amp;md5=fd718c274fca4f0268e61ab062124557" title="Flattr" target="_blank"><img
src="http://blog.martinfjordvald.com/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded> <wfw:commentRss>http://blog.martinfjordvald.com/2011/04/optimizing-nginx-for-high-traffic-loads/feed/</wfw:commentRss> <slash:comments>28</slash:comments> <atom:link rel="payment" href="https://flattr.com/submit/auto?user_id=mfjordvald&amp;popout=1&amp;url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F04%2Foptimizing-nginx-for-high-traffic-loads%2F&amp;language=en_GB&amp;category=text&amp;title=Optimizing+Nginx+for+High+Traffic+Loads&amp;description=I+have+previously+talked+about+some+of+the+most+common+Nginx+questions%3B+not+surprisingly%2C+one+such+question+is+how+to+optimize+Nginx.+This+is+not+really+overly+surprising+since+most...&amp;tags=blog" type="text/html" /> </item> <item><title>Ingenious Community Engagement</title><link>http://blog.martinfjordvald.com/2011/02/ingenious-community-engagement/</link> <comments>http://blog.martinfjordvald.com/2011/02/ingenious-community-engagement/#comments</comments> <pubDate>Tue, 22 Feb 2011 03:26:45 +0000</pubDate> <dc:creator>mfjordvald</dc:creator> <category><![CDATA[Site Management]]></category> <category><![CDATA[Technology]]></category><guid
isPermaLink="false">http://blog.martinfjordvald.com/?p=194</guid> <description><![CDATA[In general I feel like there's too much of a focus on social media. This is a site which puts a lot of emphasis on being a local news site, they even have a section called Around Town. Contrast that to their lack of community stuff, there's nothing beyond comments, I couldn't even find an RSS feed.]]></description> <content:encoded><![CDATA[<div
class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a
href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fingenious-community-engagement%2F"><br
/> <img
src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fingenious-community-engagement%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br
/> </a></div><p><a
href="http://blog.martinfjordvald.com/wp-content/uploads/2011/02/76416b79135dcc40ce25679c64cc6a66.png"><img
class="size-medium wp-image-197 alignright" title="User Engagement" src="http://blog.martinfjordvald.com/wp-content/uploads/2011/02/76416b79135dcc40ce25679c64cc6a66-78x300.png" alt="User Engagement" width="78" height="300" /></a>Usually when you browse the internet you get your information and you&#8217;re on your way. Sometimes a website isn&#8217;t properly constructed and prevents you from getting your information easily, this causes you to become annoyed. Rarely you find something that is so awesome you just have to admire it and the thoughts that have gone into it. Today I found one of those things.</p><p>There are a lot of difficult aspects of starting and running a site. You have to consider how to drive traffic, how to rank on search engines, how to convert traffic to paying customers of participating users, and today I saw an ingenious example of how to engage users and encourage participation.</p><p>The image on the right is taken from a news post on the <a
title="NBC Dallas" href="http://www.nbcdfw.com/news/local-beat/TSA-Agent-Slips-Through-DFW-Body-Scanner-With-a-Gun-116497568.html" target="_blank">NBC Dallas site</a>, it shows the general reader feedback and allows readers to submit their own feedback. The unique aspect is of course how they group the feedback. It&#8217;s not a generic 1 to 10 rating, it&#8217;s not a &#8220;let your voice be heard&#8221; sentence, it&#8217;s a colourful, graphic and <strong>quantified</strong> display. I cannot stress enough how awesome this is executed. What they manage to do is immediately quantify the response to an article which encourage users to add their voice.</p><p>This is an excellent opportunity to convert a drive-by reader into a repeat reader and perhaps even a community user. When you See the &#8220;I AM:&#8221; option it&#8217;s so easy to voice your opinion in a way that makes sense and feel like you&#8217;re contributing. If you&#8217;re just rating on a 1-10 scale you&#8217;re just giving an arbitrary number with no semantic meaning. Submitting a comment is obviously a way to make your voice be heard from often one that requires effort and thus not something you&#8217;ll do often. This style gently leads users from one-time reader to engaged user.</p><p>A further benefit of the immediate quantification of the public opinion is that you can start doing stuff like they do at the top of their article. Basically they take their data and use it to make it more interesting. &#8220;Armed Agent Slips Past DFW Body Scanner&#8221; is interesting but &#8220;Locals are furious as armed agent slips past DFW body scanner&#8221; is emotionally charged and far more likely to generate a click.</p><p><a
href="http://blog.martinfjordvald.com/wp-content/uploads/2011/02/fac887f3ed3191045903b8d15aff4232.png"><img
class="aligncenter size-full wp-image-198" title="fac887f3ed3191045903b8d15aff4232" src="http://blog.martinfjordvald.com/wp-content/uploads/2011/02/fac887f3ed3191045903b8d15aff4232.png" alt="News Suggestion" width="509" height="97" /></a></p><p>So the concept is awesome, the design is well done and right at eye level. But is there anything we can do to improve it? Well let&#8217;s take a look at what happens when we decide to give them our feedback.</p><p><a
href="http://blog.martinfjordvald.com/wp-content/uploads/2011/02/c6726ef994873134d722d46375dcd9fa.png"><img
class="alignright size-medium wp-image-200" title="c6726ef994873134d722d46375dcd9fa" src="http://blog.martinfjordvald.com/wp-content/uploads/2011/02/c6726ef994873134d722d46375dcd9fa-93x300.png" alt="" width="93" height="300" /></a>We see that our choice is highlighted and we&#8217;re given the option of spreading our opinion to Twitter or Facebook. This is okay for getting our message out there, but studies have shown that most twitter messages aren&#8217;t actually read any people other than the author. Most of the marketing you get out there is just screamed into a void of boundless other marketing.</p><p>The other option is to encourage users to leave a comment and explain their reaction to the  news. This allows us to get an email address and give users the option to sign up for notification upon a reply. That way we create a return reader who will become familiar with the site and more likely to return in the future.</p><p>In general I feel like there&#8217;s too much of a focus on social media. This is a site which puts a lot of emphasis on being a local news site, they even have a section called Around Town. Contrast that to their lack of community stuff, there&#8217;s nothing beyond comments, I couldn&#8217;t even find an RSS feed.</p><p>I&#8217;m pretty sure they&#8217;d see a higher user retention and ultimately a higher traffic amount if they encouraged users to subscribe to a news feed and encouraged them to engage with other users  instead of asking users to promote them on social media sites.</p> <img
src="http://blog.martinfjordvald.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=194" width="1" height="1" style="display: none;" /><p><a
href="http://blog.martinfjordvald.com/?flattrss_redirect&amp;id=194&amp;md5=9dab158f3251150aaa7bfab32d5fea49" title="Flattr" target="_blank"><img
src="http://blog.martinfjordvald.com/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded> <wfw:commentRss>http://blog.martinfjordvald.com/2011/02/ingenious-community-engagement/feed/</wfw:commentRss> <slash:comments>3</slash:comments> <atom:link rel="payment" href="https://flattr.com/submit/auto?user_id=mfjordvald&amp;popout=1&amp;url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fingenious-community-engagement%2F&amp;language=en_GB&amp;category=text&amp;title=Ingenious+Community+Engagement&amp;description=Usually+when+you+browse+the+internet+you+get+your+information+and+you%26%238217%3Bre+on+your+way.+Sometimes+a+website+isn%26%238217%3Bt+properly+constructed+and+prevents+you+from+getting+your+information+easily%2C+this...&amp;tags=blog" type="text/html" /> </item> <item><title>Implementing Full-Page caching with Nginx and PHP</title><link>http://blog.martinfjordvald.com/2011/02/implementing-full-page-caching-with-nginx-and-php/</link> <comments>http://blog.martinfjordvald.com/2011/02/implementing-full-page-caching-with-nginx-and-php/#comments</comments> <pubDate>Fri, 18 Feb 2011 19:50:00 +0000</pubDate> <dc:creator>mfjordvald</dc:creator> <category><![CDATA[Memcached]]></category> <category><![CDATA[Nginx]]></category> <category><![CDATA[Performance]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[Technology]]></category><guid
isPermaLink="false">http://blog.martinfjordvald.com/?p=116</guid> <description><![CDATA[This is part two in my caching series. Part one covered the concept behind the full page caching as well as potential problems to keep in mind. This part will focus on implementing the concept in actual PHP code. By the end of this you'll have a working implementation that can cache full pages and invalidate them intelligently when an update happens.]]></description> <content:encoded><![CDATA[<div
class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a
href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fimplementing-full-page-caching-with-nginx-and-php%2F"><br
/> <img
src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fimplementing-full-page-caching-with-nginx-and-php%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br
/> </a></div><p>This is part two in my caching series. <a
title="12,000 requests per second with Nginx, PHP and Memcached" href="http://blog.martinfjordvald.com/2010/09/12000-requests-per-second-with-nginx-php-and-memcached/" target="_blank">Part one</a> covered the concept behind the full page caching as well as potential problems to keep in mind. This part will focus on implementing the concept in actual PHP code. By the end of this you&#8217;ll have a working implementation that can cache full pages and invalidate them intelligently when an update happens.</p><p><strong>Requirements</strong></p><p>I&#8217;ll provide a fully functional framework with the simple application I used to get my benchmark figures. You&#8217;ll need the following software to be able to run it.</p><ul><li>Nginx. I&#8217;m not sure which exact version but I generally use and recommend the latest development version.</li><li>PHP 5.3.0. I recommend at least 5.3.3 so you&#8217;ll have PHP-FPM for your fastcgi process management.</li><li>MySQL</li><li>Memcached</li></ul><p><strong>The Framework</strong></p><p>You can download the framework here: <a
href="http://blog.martinfjordvald.com/wp-content/uploads/2011/02/evil_genius_framework.zip">Evil Genius Framework</a>. I&#8217;ll be referencing code in the files instead of pasting it in this post to keep the size down, so you will probably want to download it.</p><p><span
id="more-116"></span>The framework uses a 3 tiered setup like most of the popular frameworks. It consists of controllers, libraries and views.</p><ul><li>A controller is what handles the flow of the request. It parses the input provided and decides on what action to take. Only one controller will ever be loaded during a request.</li><li>The libraries handle the brunt of the work, they&#8217;re usually be the ones to access the database and generate the actual data for the controller to handle. Several libraries might be used during a request.</li><li>Views are the template logic, they&#8217;re non-parsing and use PHP for their logic.</li></ul><p>The index.php file handles the routing, there are a few settings there but nothing really interesting for this blog post. The only thing you need to know if you want to mess around with the sample application is that there is a direct URI to file routing. There is no manual routing available.</p><p><strong>The Caching Logic</strong></p><p>Just so we&#8217;re on the same page, the goal here is to define a way of invalidating cached pages that use stale data. Since the cached pages are served directly we have to invalidate cached pages when the data is being changed. So before we begin the implementation of this we&#8217;ll need a few concepts to help us keep things straight.</p><ul><li>Cache keys. This is how pages will be identified in the cache. The framework uses a direct URI to controller mapping it makes sense to use the URI as the cache key, so if refer to the URI or the cached page I mean the key under which its cached.</li><li>DataKeys. These are essentially identifiers for data. The goal is to prevent stale data so we obviously need a way to identify and reference the data we&#8217;ll be working with.</li></ul><p>With the cache keys and dataKeys concepts defined we can now begin to implement the invalidation logic. For this we need to track the data and establish a relation between data and cache keys. As we established in part one there might be multiple controllers using the same data so we need to map what data every controller use. Furthermore we need each controller to report which cache keys they generate so that we can invalidate them.</p><p>This is where cachetracker.php comes in, you can find it in the core directory. All caching logic is handled by this file. If you look at the top of it you&#8217;ll see an interface called ControllerCacheable. Every controller which handles cached data needs to implement this interface.</p><p>ControllerCacheable defines two methods, <code>dataKeyReads()</code> and <code>dataKeyInvalidates()</code>. The former handles mapping data to controllers and the latter handles mapping data to cache keys.</p><p><code>DataKeyReads()</code> should return an array of the dataKey a controller will read from. This allows us to easily iterate every controller and generate a dependency mapping of data -&gt; controller</p><p><code>DataKeyInvalidates()</code> accepts the dataKey to invalidate and an optional payload (will show example later). When given a dataKey this method should return an array of cache keys that use this dataKey. These cache keys will then be invalidated.</p><p>The CacheTracker generates the dependency mapping in the  <code>getDataKeysAccessors()</code> method. It will iterate through the controllers directory and call the <code>DataKeyReads()</code> method if it implements the ControllerCacheable interface. After covering all the cacheable controllers the mapping list will be stored to a file &#8216;deplist.txt&#8217; in the root directory relative to the index.php. Please note that if you change the dataKeys a controller uses you&#8217;ll have to delete this file so that it&#8217;ll be regenerated.</p><p>The second method of interest in the CacheTracker is <code>triggerDataKeyInvalidation()</code>. This is the method that one should call whenever a change to data has been made. This method checks the dependency mapping list and call <code>dataKeyInvalidates()</code> in the controllers which use the dataKeys. At this point we&#8217;ve essentially managed to get the cache key used by every controller which use the piece of data we&#8217;ve just updated. Time to see how this translates into a real world example.</p><p><strong>The Sample Application</strong></p><p>The application I&#8217;ve included in the download is quite simple as it&#8217;s intended to showcase the concept only, it&#8217;s not a valid measurement of how fast a real world application would be. With that out of the way, have a look at the news.php controller. It&#8217;s got everything a news script really requires, news and comments! The actual news and comments implementation is not overly interesting so scroll to the bottom of the file and check out the methods defined by our ControllerCacheable interface.</p><p><code>DataKeyReads()</code> defines an array with elements news and comments. These are the dataKeys that this entire controller deals with.</p><p><code>DataKeyInvalidates()</code> converts a DataKey into the cache keys pages are stored under. The code pretty much speaks for itself but I do want to point out the use of $payload as this is a good example of how the payload information can be used to pinpoint the exact cache keys to invalidate. Without it we would have had to invalidate all the news posts.</p><p>Next in the sample application is the news library. It&#8217;s located in the cachetest folder under libraries. The interesting part here is the call to CacheTracker::triggerDataKeyInvalidation() whenever the library changes the data.</p><p>If you want to try out the sample application you need to configure a few things first. Inside the includes directory there is a config.php file. The various configuration options should speak for themselves. There is also an .sql file in the root which contains the table definitions and some sample data.</p><p><strong>The Nginx Configuration</strong></p><p>The final part of the puzzle is to make sure Nginx serves the cached pages instead of sending them to PHP. The configuration is as follows:</p><pre>upstream memcached {

	server     127.0.0.1:11211;
	keepalive 1024 single;
}

upstream backend {
	server     127.0.0.1:9000;
}

server {
	listen          80;
	server_name     live.framework.com;

	access_log      /var/log/nginx/framework.access.log;
	error_log       /var/log/nginx/framework.errors.log notice;

	root            /home/framework;

	try_files $uri @missing;

	location @missing {
		rewrite ^(.*[^/])$ $1/ permanent; # Add a trailing slash if none exist.
		rewrite ^ /index.php last;
	}

	# Forbid the system dir, but allow media files.
	location ~* ^/system/.+\.(jpg|png|gif|css|js|swf|flv|ico)$ {
		expires max;
		tcp_nodelay off;
		tcp_nopush on;
	}

	location ~ /system {
		rewrite ^ /index.php last;
	}

	# Check cache and use PHP as fallback.
	location ~* \.php$ {
		if (!-f $request_filename) {
			return 404;
		}

		default_type text/html;
		charset      utf-8;

		if ($request_method = GET) {
			set $memcached_key fw53$request_uri;

			memcached_pass     memcached;
			error_page         404 = @nocache;
		}

		if ($request_method != GET) {
			fastcgi_pass backend;
		}

	}

	location @nocache {
		fastcgi_pass backend;
	}
}</pre><p>The caching part is towards the end, we set the memcached key which is the namespace plus the URI, if we get a 404 not found we instead pass to the fastcgi backend. It&#8217;s really that simple on the Nginx end. The only thing to note is that I&#8217;m using the <a
title="Upstream Keepalive Module" href="http://wiki.nginx.org/HttpUpstreamKeepaliveModule" target="_blank">Upstream Keepalive 3rd party module</a> for Memcached keepalives, this removes some of the overhead from connecting to Memcached. If you do not have this module compiled in you can simply remove the keepalive line from the upstream block.</p><p><strong>Limitations</strong></p><p>I covered this aspect in part 1 but I feel it&#8217;s something that&#8217;s worth pointing out again. The method I&#8217;ve used in this framework allows for easy mapping of data-&gt;controller-&gt;cache-key relations, but only in cases where the cache key is predictable based on the data. In my daily usage I find that a large majority of my application often have a predictable relationship between data and the cached pages, however, there are common situations where it&#8217;s simply impossible to avoid stale data.</p><p>The most obvious example of this is if you have built-in search. Say someone searches for Platypus on your blog about animals, since the Platypus is an awesome animal you have a lot of posts about it. The URI (and thus our cache key) for this search page is /search/platypus/. Now if you add another or edit an article about the Platypus your search results will now be outdated. You can obviously invalidate /search/platypus/, but what about the URI /search/duck/ or /search/australia/ which also return your Platypus articles? Suddenly mapping the relations become downright impossible.</p><p>I haven&#8217;t really been able to think of a way to actually solve this problem other than simply accepting stale data and do TTL caching, or accepting the performance hit and not cache at all. Thankfully with the use of tools like Sphinx or another dedicated search daemon the performance and scalability should be easy enough to handle.</p><p>An obvious limitation is, of course, also truly dynamic data. If your page contains the string &#8220;Welcome $username!&#8221; then you might not want to cache that page unless every one goes by the same username, naturally. There are ways to handle caching of dynamic data as well, though. Edge Side Includes is one such way and I plan to play around with that myself and possibly write a part 3 in this series. Until then I&#8217;d love to hear how useful you think this method of cache invalidation is.</p><p>On a side note, what do you guys actually think of the framework as a whole? It&#8217;s written to be really lightweight, provide options for streamlining the development process but otherwise stay out of your way. It does create some limitations such as no manual routing, but do people actually ever need this instead of just straight file to URI mapping?</p> <img
src="http://blog.martinfjordvald.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=116" width="1" height="1" style="display: none;" /><p><a
href="http://blog.martinfjordvald.com/?flattrss_redirect&amp;id=116&amp;md5=8a99f62376b43350cb78137e64a99f41" title="Flattr" target="_blank"><img
src="http://blog.martinfjordvald.com/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded> <wfw:commentRss>http://blog.martinfjordvald.com/2011/02/implementing-full-page-caching-with-nginx-and-php/feed/</wfw:commentRss> <slash:comments>5</slash:comments> <atom:link rel="payment" href="https://flattr.com/submit/auto?user_id=mfjordvald&amp;popout=1&amp;url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fimplementing-full-page-caching-with-nginx-and-php%2F&amp;language=en_GB&amp;category=text&amp;title=Implementing+Full-Page+caching+with+Nginx+and+PHP&amp;description=This+is+part+two+in+my+caching+series.+Part+one+covered+the+concept+behind+the+full+page+caching+as+well+as+potential+problems+to+keep+in+mind.+This+part+will...&amp;tags=blog" type="text/html" /> </item> <item><title>WordPress Performance Benchmarks</title><link>http://blog.martinfjordvald.com/2011/02/wordpress-performance-benchmarks/</link> <comments>http://blog.martinfjordvald.com/2011/02/wordpress-performance-benchmarks/#comments</comments> <pubDate>Fri, 18 Feb 2011 18:48:38 +0000</pubDate> <dc:creator>mfjordvald</dc:creator> <category><![CDATA[Performance]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[Technology]]></category><guid
isPermaLink="false">http://blog.martinfjordvald.com/?p=176</guid> <description><![CDATA[My friend Karl Blessing and I recently talked about Wordpress caching plugins. He uses WP SuperCache and I use W3 Total Cache and he subsequently decided to do some Wordpress caching benchmarks on the different methods. He's done an awesome job and generated some pretty graphs for you to look at.]]></description> <content:encoded><![CDATA[<div
class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a
href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fwordpress-performance-benchmarks%2F"><br
/> <img
src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fwordpress-performance-benchmarks%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br
/> </a></div><p>Recently a post of mine was linked on yCombinator and for some reason a lot of the comments talked about the efficiency of WordPress. While it&#8217;s technically not related to the subject of the linked post I just want to point out that the performance of WordPress is pretty horrible regardless of whether you use Apache or Nginx.</p><p>My friend <a
title="Karl Blessings Blog" href="http://kbeezie.com/" target="_blank">Karl Blessing</a> and I recently talked about WordPress caching plugins. He uses WP SuperCache and I use W3 Total Cache and he subsequently decided to do some <a
title="Wordpress Caching Benchmarks" href="http://kbeezie.com/view/caching-wordpress/" target="_blank">WordPress caching benchmarks </a>on the different methods. He&#8217;s done an awesome job and generated some pretty graphs for you to look at.</p><p>What I took away from the whole thing is that W3 Total Cache and WP SuperCache can offer similar performance if you&#8217;re willing to do static file caching, however, W3 Total Cache can offer a cleaner solution with caching in Memcached if you&#8217;re willing to sacrifice a bit of performance. The benefit to this, and why I use this method, is that you don&#8217;t need complicated rules in your Nginx (or Apache) configuration files.</p> <img
src="http://blog.martinfjordvald.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=176" width="1" height="1" style="display: none;" /><p><a
href="http://blog.martinfjordvald.com/?flattrss_redirect&amp;id=176&amp;md5=6cba94eb604b2d537cc23b7e82fa11e7" title="Flattr" target="_blank"><img
src="http://blog.martinfjordvald.com/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded> <wfw:commentRss>http://blog.martinfjordvald.com/2011/02/wordpress-performance-benchmarks/feed/</wfw:commentRss> <slash:comments>2</slash:comments> <atom:link rel="payment" href="https://flattr.com/submit/auto?user_id=mfjordvald&amp;popout=1&amp;url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fwordpress-performance-benchmarks%2F&amp;language=en_GB&amp;category=text&amp;title=WordPress+Performance+Benchmarks&amp;description=Recently+a+post+of+mine+was+linked+on+yCombinator+and+for+some+reason+a+lot+of+the+comments+talked+about+the+efficiency+of+WordPress.+While+it%26%238217%3Bs+technically+not+related+to...&amp;tags=blog" type="text/html" /> </item> <item><title>Nginx Primer 2: From Apache to Nginx</title><link>http://blog.martinfjordvald.com/2011/02/nginx-primer-2-from-apache-to-nginx/</link> <comments>http://blog.martinfjordvald.com/2011/02/nginx-primer-2-from-apache-to-nginx/#comments</comments> <pubDate>Fri, 04 Feb 2011 00:32:02 +0000</pubDate> <dc:creator>mfjordvald</dc:creator> <category><![CDATA[Nginx]]></category> <category><![CDATA[Technology]]></category><guid
isPermaLink="false">http://blog.martinfjordvald.com/?p=125</guid> <description><![CDATA[The Big Picture So you&#8217;ve finally decided to make the switch from Apache to Nginx. You most likely did this for performance reasons; perhaps all those blogs have been writing about how fast Nginx is or perhaps your webmaster friends have been raving about how they can now handle a lot more traffic without spending [...]]]></description> <content:encoded><![CDATA[<div
class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a
href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fnginx-primer-2-from-apache-to-nginx%2F"><br
/> <img
src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fnginx-primer-2-from-apache-to-nginx%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br
/> </a></div><p><strong>The Big Picture</strong></p><p>So you&#8217;ve finally decided to make the switch from Apache to Nginx. You most likely did this for performance reasons; perhaps all those blogs have been writing about how fast Nginx is or perhaps your webmaster friends have been raving about how they can now handle a lot more traffic without spending money on hardware.</p><p>This is usually all true, but why exactly is Nginx so much faster than the typical Apache setup of the prefork MPM and mod_php? The technical explanation is that Nginx is a non-blocking event based architecture while Apache is a blocking process based architecture. To <strong>simplify it heavily </strong>the theory is like this:</p><p><strong>Apache Prefork Processes:</strong></p><ul><li>Receive PHP request, send it to a process.</li><li>Process receives the request and pass it to PHP.</li><li>Receive an image request, see process is busy.</li><li>Process finishes PHP request, returns output.</li><li>Process gets image requests and returns the image.</li></ul><p>While the process is handling the request it is not capable of serving another request, this means the amount of requests you can do simultaneously is directly proportional to the amount of processes you have running. Now, if a process took up just a small bit of memory that would not be too big of an issue as you could run a lot of processes. However, the way a typical Apache + PHP setup has the PHP binary embedded directly into the Apache processes. This means Apache can talk to PHP incredibly fast and without much overhead, but it also means that the Apache process is going to be 25-50MB in size. Not just for requests for PHP requests, but also for all static file requests. This is because the processes keep PHP embedded at all times due to cost of spawning new processes. This effectively means you will be limited by the amount of memory you have as you can only run a small amount of processes and a lot of image requests can quickly make you hit your maximum amount of processes.</p><p>Compare this to the Nginx event based method.</p><p><span
id="more-125"></span><strong>Nginx Event Based Processing:</strong></p><ul><li>Receive request, trigger events in a process.</li><li>The process handles all the events and returns the output</li></ul><p>On the surface it seems fairly similar, except there&#8217;s no blocking. This is because the process handles events in parallel. One connection is not allowed to affect another connection even if run simultaneously. This adds some limitations to how you can program the web server, but it makes for far faster processing as one process can now handle tons of simultaneous requests.</p><p>Remember those limitations, though? Yeah, they&#8217;ll affect how we, as users, have to use Nginx. For instance, we can no longer embed PHP into our Nginx process as PHP is not asynchronous. Essentially, if we embedded PHP into Nginx the event based architecture of Nginx would be rendered void as PHP would be blocking and connections would just pile up.</p><p>Since a web server isn&#8217;t all that desirable without the ability to use dynamic scripting languages Nginx has support for various communication protocols such as FastCGI, SCGI and UWSGI. Conveniently enough, PHP happens to support FastCGI, which means we can still use PHP.</p><p><strong>FastCGI Versus Embedding</strong></p><p>The thing that confuses most first time users of something not Apache is that suddenly they have to actually handle the PHP part themselves. Nginx takes a complete hands off approach to all dynamic scripting languages and reverse proxy situations. (more on this later)</p><p>With Apache you just configure it to use PHP, start it and forget about it. With Nginx you have to spawn a number of PHP processes and tell Nginx to talk to them. This has both advantages and disadvantages.</p><p>The primary advantage is that you now have complete separation of your web server and PHP. If you change something in PHP you don&#8217;t have to restart Nginx, only PHP. Similarly, if you want to restart Nginx then PHP keeps running. Since Nginx supports upgrading the binary and changing the configuration on-the-fly this means you can upgrade your web server or change its configuration without any down time at all.</p><p>The primary disadvantage is that you have to handle the spawning and process control of PHP yourself, or rather Nginx won&#8217;t do it for you. A year ago this was a bit troublesome. You had to use a script called spawn-fcgi to spawn the fastcgi processes and then monitor them with something secondary like Monit. Alternatively you could use a patch to the PHP source code called PHP-FPM &#8211; quite literally FastCGI Process Manager. As of PHP 5.3.3 PHP-FPM is now part of the PHP core, which means you don&#8217;t have to patch the source code yourself, but can just set a compile time configure flag. Naturally, most distribution repositories have a PHP build with PHP-FPM enabled as well. So really, this disadvantage is barely a disadvantage any more as PHP-FPM is a very cool thing in itself. More on that another day.</p><p>A second disadvantage, which is heavily disputed is that since the PHP process is no longer embedded there is an overhead in talking to PHP. This is used as an argument by some people as for why you should use Nginx to handle static files and Apache to handle the PHP files. Theoretically there is an overhead in talking with Apache as well, so I personally doubt there&#8217;s much of a disadvantage, but I don&#8217;t have the tests to back that up.</p><p><strong>The Configuration Differences</strong></p><p>Okay so we now know what the big picture is, time to understand how Apache and Nginx differs on a configure level. This is where we&#8217;ll actually spend most of our time so this is what is important to know about. I have to stress that it&#8217;s important to actually learn about Nginx. Nginx has undergone a huge amount of development in the last few years and as a result there is a lot of different advice out on various blogs. To be quite frank, a lot of it is bollocks and should be removed as it is often down right wrong. In general, if a tutorial or guide is making heavy use of ifs then avoid it, there are far better alternatives such as try_files, I&#8217;ll cover that a bit in this section.</p><p><strong>.htaccess</strong></p><p>The biggest headache for a lot of people is that they no longer have access to .htaccess. There is no way to change your Nginx configuration without issuing a reload command to Nginx. The effect can be simulated by using include directives in the main Nginx configuration file, but any change will not take effect until the configuration file has been reloaded. This is also the primary reason why Nginx is not very suited for shared hosting, not even in a situation where you reverse proxy to Apache.</p><p>There is no alternative in Nginx so if you want to use Nginx this is one thing you have to live with, in the beginning it might be annoying but after a few hours you really won&#8217;t notice or miss it.</p><p><strong>Apache &amp; Nginx Rewrites<br
/> </strong></p><p>Typically in Apache you will specify your rewrites in .htaccess or at least in a global context so that they are always evaluated. Nginx provides locations to prevent this exact thing. We don&#8217;t want to execute something that we don&#8217;t need to, so if we need to rewrite the URI example.org/forum/index.php?topic=3 into forum.example.org/index.php?topic=3 we should use a location to ensure we only execute this rewrite when absolutely required.</p><p>Another thing about Nginx rewrites is that by default they are internal rewrites, which means that they won&#8217;t change the URI the browser sees. They will only do that if you specify the &#8220;redirect&#8221; or &#8220;permanent&#8221; rewrite flag or if you rewrite to an absolute URL including the http:// part.</p><p><strong>Apache RewriteCond</strong></p><p>Rewrite Conditions is something which doesn&#8217;t really exist in Nginx. Something you might see in Apache is the following:</p><pre>RewriteCond   %{HTTP_HOST}   ^example.org$   [NC]
RewriteRule   ^(.*)$   http://www.example.com/$1   [R=301,L]</pre><p>The directly translated Nginx equivalent would be</p><pre>if ($host != 'example.org' ) {
    rewrite  ^/(.*)$  <a rel="nofollow" href="http://www.site.com/$1" target="_blank">http://www.example.org/$1</a> permanent;
}</pre><p>I took this from an actual tutorial out there. This is wrong for two reasons. Most importantly Nginx does a lot of optimization to vhosts. If you use an if like this you miss out on that optimization as a request for example.org and www.example.org will both have to go to the vhost, then parse the if and then the regex. Less important is that you&#8217;re capturing data Nginx has already captured for you.</p><p>The Nginx way to do this is the following:</p><pre>server {
    server_name example.org;
    rewrite ^ http://www.example.org$request_uri? permanent;
}</pre><p>This way a request for example.org will not parse to the www.example.org server block, we won&#8217;t have to actually parse a regex since it&#8217;s just a starting character and nothing else, and the rewrite still uses the URI captured by Nginx.</p><p>Another popular RewriteCond is the following:</p><pre>rewritecond %{REQUEST_FILENAME}!-d
rewritecond %{REQUEST_FILENAME}!-f
rewrite ^ /index.php [L]</pre><p>Basically, if the requested file doesn&#8217;t exist and isn&#8217;t a directory then do execute a rewrite, this is often used to implement pretty URLs. In Nginx you&#8217;d do this like the following:</p><pre>location / {
    try_files $uri $uri/ /index.php$is_args$args;
}</pre><p>What this means is that it will first check if $uri exists, (in relation to your root directive) then it will check if $uri/ exists, which would be a directory. Finally, if none of these exists it will do an internal rewrite to index.php</p><p>A quick thing to note is that while ifs are usually discouraged there are some cases where they cannot be avoided. For example if you want to check the request method is GET before you send to the backend then you are forced to use if ($request_method = GET) This will usually work just fine and isn&#8217;t too bad considering there isn&#8217;t a more optimal way to do it.</p><p><strong>Using Apache and Nginx Together</strong></p><p>A common scenario is for people to use both Apache and Nginx. They&#8217;ll have Nginx handle the static files and then proxy the dynamic requests to PHP. This can be beneficial if you are not completely ready to ditch Apache, for instance if you have a legacy application relying on Apache or .htaccess files. There are also a few cases where Nginx alone cannot handle all use cases. A common one is when people need a HTTP front end for SVN. The Nginx WebDAV module supports only a limited set of the protocol so in this case you need to reverse proxy to Apache and let it handle it.</p><p>On the flip side, having Apache handle your dynamic requests introduces another layer of complexity in your server setup, causing more potential problem areas. Usually you don&#8217;t actually need Apache so there&#8217;s really no reason to keep it around as a crutch, in the long run it will benefit you more to use just one of them.</p><p>if you do decide to use both, and plenty of people do, then there are a few things you should know. I won&#8217;t cover how to configure Nginx for this as the Wiki is plenty resourceful there. But keep this in mind:</p><p>When you reverse proxy to Apache, Nginx will handle the connection, parse it and then create a new connection to Apache. It does <strong>not</strong> tunnel the connection so you have to specify which headers to pass on. Furthermore, the connection Nginx establishes is using HTTP/1.0, and not HTTP/1.1. This means things such as chunked encoding is not supported.</p><p>To make your reverse proxy experience as easy as possible you should therefore keep a few things in mind.</p><ul><li>Let Nginx handle SSL and keep the connection between Nginx and Apache in plain text. There is no reason to add the overhead of SSL unless you are afraid of a man-in-the-middle attack on your internal network.</li><li>Let Nginx handle the gzipping of content. Don&#8217;t gzip on the Apache side as that will most likely cause you some major headaches.</li></ul><p>If you want a more comprehensive guide for reverse proxying and are too lazy to read the wiki then see <a
title="Apache and Nginx Together" href="http://kbeezie.com/view/apache-with-nginx/" target="_blank">kblessinggr&#8217;s Apache and Nginx Together guide</a>.</p><p><strong>A Note on Shared Hosting</strong></p><p>I mentioned earlier that Nginx isn&#8217;tt very suited for shared hosting. You might think that if you let Nginx handle just static files while proxying non-existing files and dynamic files to Apache then you&#8217;ll be good. This isn&#8217;t quite the case, though. As an example, consider basic authentication, if you want to have basic authentication on a static file then you will need to add it in the Nginx configuration, and users cannot do this without having to reload the configuration file. If you allow them to change it and reload it then they can make a mistake and ruin it for everyone.</p><p>Think <strong>very</strong> carefully about every aspect before you decide to use Nginx in your shared hosting setups.</p> <img
src="http://blog.martinfjordvald.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=125" width="1" height="1" style="display: none;" /><p><a
href="http://blog.martinfjordvald.com/?flattrss_redirect&amp;id=125&amp;md5=7001c21c0b9964638dfb99f71df3af15" title="Flattr" target="_blank"><img
src="http://blog.martinfjordvald.com/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded> <wfw:commentRss>http://blog.martinfjordvald.com/2011/02/nginx-primer-2-from-apache-to-nginx/feed/</wfw:commentRss> <slash:comments>23</slash:comments> <atom:link rel="payment" href="https://flattr.com/submit/auto?user_id=mfjordvald&amp;popout=1&amp;url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F02%2Fnginx-primer-2-from-apache-to-nginx%2F&amp;language=en_GB&amp;category=text&amp;title=Nginx+Primer+2%3A+From+Apache+to+Nginx&amp;description=The+Big+Picture+So+you%26%238217%3Bve+finally+decided+to+make+the+switch+from+Apache+to+Nginx.+You+most+likely+did+this+for+performance+reasons%3B+perhaps+all+those+blogs+have+been+writing...&amp;tags=blog" type="text/html" /> </item> <item><title>&#8220;No input file specified&#8221; With PHP and Nginx</title><link>http://blog.martinfjordvald.com/2011/01/no-input-file-specified-with-php-and-nginx/</link> <comments>http://blog.martinfjordvald.com/2011/01/no-input-file-specified-with-php-and-nginx/#comments</comments> <pubDate>Wed, 19 Jan 2011 17:13:40 +0000</pubDate> <dc:creator>mfjordvald</dc:creator> <category><![CDATA[Nginx]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[Technology]]></category><guid
isPermaLink="false">http://blog.martinfjordvald.com/?p=141</guid> <description><![CDATA[&#8220;No input file specified&#8221; is one of the most frequently encountered issues in Nginx. People on serverfault and in the #nginx IRC channel asks for help with this so often that this post is mostly to allow me to be lazy and not have to type up the same answer every time. This is actually [...]]]></description> <content:encoded><![CDATA[<div
class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a
href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F01%2Fno-input-file-specified-with-php-and-nginx%2F"><br
/> <img
src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F01%2Fno-input-file-specified-with-php-and-nginx%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br
/> </a></div><p>&#8220;No <em>input file</em> specified&#8221; is one of the most frequently encountered issues in Nginx. People on serverfault and in the #nginx IRC channel asks for help with this so often that this post is mostly to allow me to be lazy and not have to type up the same answer every time.</p><p>This is actually an error from PHP and due to display_errors being 0 people will often just get a blank page with no output. In a typical setup PHP will then send the error to stderr or stdout and Nginx will pick up on it and log it in the Nginx error log file. Thus people spend a ton of time trying to figure out why Nginx isn&#8217;t working.</p><p>The root cause of the error is that PHP cannot find the file Nginx is telling it to look for, and there are two common cases that causes this. Either you&#8217;re not giving PHP the right path to the file or your file permissions are incorrect.</p><p><strong>Wrong Path Sent to PHP</strong></p><p>The most common reason at the time of writing happens because a user uses a horrible tutorial found via google instead of actually understanding Nginx. <a
title="A Nginx Primer" href="http://blog.martinfjordvald.com/2010/07/nginx-primer/">Reading my primer</a> will equip you to actually solve this on your own but since this post is actually dedicated to the error I&#8217;ll cheat this once and allow you to be lazy by just giving you the full solution.</p><p>Nginx tells PHP about the file to execute via the SCRIPT_FILENAME fastcgi_param value. Most examples in the wiki should define this as $document_root$fastcgi_script_name. The horrible tutorials will often hardcode the path value but this is not desirable as we don&#8217;t want to duplicate information and invite future screw ups. So you&#8217;ve gone with the $document_root$fastcgi_script_name option and suddenly it&#8217;s no longer working.</p><p>This happens because Nginx has 3 levels of inheritance commonly referred to as blocks, these being http, server and location, each being a sub-block of the parent. Directives in nginx inherit downwards but never up or across, so if you define something in one location block it will never be applied in any other location block under any circumstance.</p><p>Typically users define their index and root directive in location / because a tutorial told them to. So when they then define SCRIPT_FILENAME using $document_root the root directive is not actually defined and thus the SCRIPT_FILENAME value becomes just the URI making PHP look at the root server dir.</p><p>The simple solution here is to just define the directive in your server block. (or http block even!) Generally the higher up your can define a directive the less duplicate directives you&#8217;ll need.</p><p><strong>Incorrect File Permissions</strong></p><p>Most people don&#8217;t really believe me when I tell them their file permissions are incorrect. They&#8217;re looking at the damn permissions and the PHP user can read the file just fine!<strong></strong> Sadly, this shows a lack of understanding of Unix user permissions. Being able to read a file is not enough, a user must also be able to traverse to the file.</p><p>This effectively means that not only should the file have read permission, but the entire directory structure should have execute permission so that the PHP user can traverse the path. An example of this:</p><p>Say you have an index.php file in /var/www. /var/www/index.php must have read permission and both /var and /var/www must have execute permissions!</p><p>If you&#8217;ve corrected both things and still have this issue then please put a comment so I can look into it, as far as I know there should be no other reasons  for this error.</p> <img
src="http://blog.martinfjordvald.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=141" width="1" height="1" style="display: none;" /><p><a
href="http://blog.martinfjordvald.com/?flattrss_redirect&amp;id=141&amp;md5=b600950a3160f58a0d0f8ee5111b7f98" title="Flattr" target="_blank"><img
src="http://blog.martinfjordvald.com/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded> <wfw:commentRss>http://blog.martinfjordvald.com/2011/01/no-input-file-specified-with-php-and-nginx/feed/</wfw:commentRss> <slash:comments>19</slash:comments> <atom:link rel="payment" href="https://flattr.com/submit/auto?user_id=mfjordvald&amp;popout=1&amp;url=http%3A%2F%2Fblog.martinfjordvald.com%2F2011%2F01%2Fno-input-file-specified-with-php-and-nginx%2F&amp;language=en_GB&amp;category=text&amp;title=%26%238220%3BNo+input+file+specified%26%238221%3B+With+PHP+and+Nginx&amp;description=%26%238220%3BNo+input+file+specified%26%238221%3B+is+one+of+the+most+frequently+encountered+issues+in+Nginx.+People+on+serverfault+and+in+the+%23nginx+IRC+channel+asks+for+help+with+this+so+often...&amp;tags=blog" type="text/html" /> </item> <item><title>12,000 Requests per second with Nginx, PHP and Memcached</title><link>http://blog.martinfjordvald.com/2010/09/12000-requests-per-second-with-nginx-php-and-memcached/</link> <comments>http://blog.martinfjordvald.com/2010/09/12000-requests-per-second-with-nginx-php-and-memcached/#comments</comments> <pubDate>Mon, 20 Sep 2010 02:12:02 +0000</pubDate> <dc:creator>mfjordvald</dc:creator> <category><![CDATA[Memcached]]></category> <category><![CDATA[Nginx]]></category> <category><![CDATA[Performance]]></category> <category><![CDATA[PHP]]></category> <category><![CDATA[Technology]]></category><guid
isPermaLink="false">http://blog.martinfjordvald.com/?p=79</guid> <description><![CDATA[Caching in PHP is usually done on a per-object basis, people will cache a query or some CPU intensive calculations to prevent redoing these CPU intensive operations. This can get you a long way. I have an old site which uses this method and gets 105 requests per second on really old hardware. The method I propose will net you a solid 12,000 requests per second.]]></description> <content:encoded><![CDATA[<div
class="tweetmeme_button" style="float: right; margin-left: 10px;"> <a
href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2010%2F09%2F12000-requests-per-second-with-nginx-php-and-memcached%2F"><br
/> <img
src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fblog.martinfjordvald.com%2F2010%2F09%2F12000-requests-per-second-with-nginx-php-and-memcached%2F&amp;style=normal&amp;b=2" height="61" width="50" /><br
/> </a></div><p><strong>Edit:</strong> <a
title="Implementing Full Page Caching with Nginx and PHP" href="http://blog.martinfjordvald.com/2011/02/implementing-full-page-caching-with-nginx-and-php/" target="_self">Part 2 is now available.</a></p><p>This is the first entry in a short series I&#8217;ll do on caching in PHP. During this series I&#8217;ll explore some of the options that exist when caching PHP code and provide a unique (I think) solution that I feel works well to gain high performance without sacrificing real-time data.</p><p>Caching in PHP is usually done on a per-object basis, people will cache a query or some CPU intensive calculations to prevent redoing these CPU intensive operations. This can get you a long way. I have an old site which uses this method and gets 105 requests per second on really old hardware.</p><p>An alternative that is used, for example in the Super Cache WordPress plug-in, is to cache the full-page data. This essentially mean that you create a page only once. This introduces the problem of stale data which people usually solve by checking whether data is still valid or by using a TTL caching mechanism and accepting stale data.</p><p>The method I propose is a spin on full-page caching. I&#8217;m a big fan of Nginx and I tend to use it to solve a lot of my problems, this case is no exception. Nginx has a built-in Memcached module, with this we can store a page in Memcached and have Nginx serve it &#8211; thus never touching PHP at all. This essentially turns this:</p><p><span
id="more-79"></span></p><pre>Concurrency Level:      50
Time taken for tests:   2.443 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      11020000 bytes
HTML transferred:       10210000 bytes
Requests per second:    2046.32 [#/sec] (mean)
Time per request:       24.434 [ms] (mean)
Time per request:       0.489 [ms] (mean, across all concurrent requests)
Transfer rate:          4404.39 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       2
Processing:     6   22  19.7     20     225
Waiting:        5   20   2.6     20      40
Total:          6   22  19.7     20     225

Percentage of the requests served within a certain time (ms)
  50%     20
  66%     21
  75%     22
  80%     22
  90%     24
  95%     26
  98%     29
  99%     39
 100%    225 (longest request)</pre><p>Into this</p><pre>Concurrency Level:      50
Time taken for tests:   0.414 seconds
Complete requests:      5000
Failed requests:        0
Write errors:           0
Total transferred:      11024350 bytes
HTML transferred:       10227760 bytes
Requests per second:    12065.00 [#/sec] (mean)
Time per request:       4.144 [ms] (mean)
Time per request:       0.083 [ms] (mean, across all concurrent requests)
Transfer rate:          25978.27 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   0.1      1       2
Processing:     1    3   0.3      3       5
Waiting:        1    1   0.3      1       4
Total:          2    4   0.3      4       7

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      4
  90%      4
  95%      4
  98%      5
  99%      5
 100%      7 (longest request)</pre><p>What&#8217;s important to note here is how these figures will scale. To get these numbers I developed a very simple proof-of-concept news script, all it does is fetch and show data from two MySQL tables: news and comments. A more complicated application might result in only 100 requests per second or if something like WordPress or Magento as low 20 requests per second! The good thing is that with full-page caching the time required to fetch and display the data depends only on the size of the cached data. Therefore if your application is written to do full-page caching it will always be able to enjoy low latency and high concurrency.</p><p><strong>The Complications</strong></p><p>Full-page caching does introduce some complications, though. As mentioned earlier the goal is to make Nginx serve the cached pages, as such we cannot perform any logic during the serving of the page. This means we need to handle invalidation of cached pages during the updating of the data they use.</p><p>To be able to invalidate pages it&#8217;s important that we understand what data we have to work with and how it relates to not only our pages, but also our code. We will be using a framework so we can create a few rules that will help us understand the whole system.</p><ul><li>The framework uses a three-tiered setup of controllers, libraries and templates.</li><li>Controllers will dictate how to handle a request defined by the URI.</li><li>Libraries will be used to access all data.</li></ul><p>This is how most frameworks work, you have a few of the big ones which use a MVC pattern but such a setup will be largely the same. From these rules we can determine how the relationship between data, controllers and pages will be.</p><ul><li>All data will need an identifier. For instance if you have a news script you&#8217;ll need an identifier for &#8220;news&#8221; and &#8220;comments&#8221;.</li><li>All controllers must specify which data they use by referencing the identifier.</li></ul><p>So to recap. The goal is to invalidate the correct pages, to do this we need to know which pages use what data. gives us 3 important parts.</p><ul><li>The library that handles the editing of data, and therefore the invalidation triggering.</li><li>The controller handles the requests based on the URI and therefore relates to the cached pages.</li><li>The actual cached pages.</li></ul><p>Finally, we&#8217;re unlikely to have only one of each, for instance often multiple controllers will be using data. To continue our news script example, we have a controller to fetch the news and a controller to generate a RSS feed of the news. Similarly a controller might generate multiple pages, for instance one page per news post to display the comments. Therefore we also need to consider the inter-data-relationships.</p><ul><li>One-to-many relationship between invalidated data and controllers.</li><li>One-to-many relationship between controllers and pages.</li></ul><p><strong>Data &amp; Controllers</strong></p><p>Earlier we defined a rule that all controllers much specify which data they use. This is useful as it means we can create a dependency list between data and controllers. When data is invalidated we can do a lookup in the dependency list and see which controllers we need to tell about the invalidated data.</p><p>This solves the problem elegantly and with OOP we can define interfaces to force controllers to implement the required methods. If they don&#8217;t we can set a flag that prevents the data from being cached and they should work normally.</p><p>One possible downside to this is that you can no longer edit files on the fly. If you change the way data is used you will most likely need to regenerate the dependency list, therefore it becomes critical that you have a deployment process in place for all code changes. Personally I think this is required any way so it does not cause me any problems, however it is something that has to be considered.</p><p><strong>Controllers &amp; Pages</strong></p><p>Websites are per their nature diverse, in this framework all requests are passed to a controller along with the URI. The controller then uses the URI to determine what data to use to generate the output. The problem here is that there is a huge range of options on how the controller might look and behave. It would be really difficult to define something like a dependency list as a controller might use multiple data sources which will update dynamically. This would require the dependency list to be updated every time new data was added, not really a feasible solution.</p><p>The easy scenario is where the page URI is directly related to the data. For example in our news script the URI /news/4/ might show the news post with ID 4. If a comment is added to this news post we trigger an invalidation on the comments data identifier. The library that inserts the data will know to insert to news post 4, therefore it can also pass this along when triggering the invalidation. This allows the controller to determine that the page /news/4/ needs to be invalidated.</p><p>The bigger problem is when data is used as part of a set defined by data not related to the updated data. A simple example here would be a search function. You have the controller search and the keyword &#8220;PHP&#8221; being searched for &#8211; the URI for this would most likely be /search/PHP/. When a news post is updated we pass along the ID to the controller but we have no way to determine which URI actually uses said news post. Keeping track of each search term is not feasible. There are a few options here but none that are really perfect.</p><ul><li>Don&#8217;t cache at all, data will always be current but might be CPU intensive.</li><li>Increase caching granularity. Pass each request to PHP but cache the IDs of the news post and fetch the current data.</li><li>Cache the full page using a time-to-live value. This means we have stale data for a bit but we keep high performance.</li></ul><p>Ultimately it depends on your situation and what will fit best. I&#8217;d imagine I&#8217;d most often choose TTL caching or in case I need current data then increased caching granularity.</p><p>This covers the overall system, next time I&#8217;ll talk about how I&#8217;ve chosen to implemented this.</p> <img
src="http://blog.martinfjordvald.com/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=79" width="1" height="1" style="display: none;" /><p><a
href="http://blog.martinfjordvald.com/?flattrss_redirect&amp;id=79&amp;md5=4b5dde3d5f26b007dff5e0339b6610f8" title="Flattr" target="_blank"><img
src="http://blog.martinfjordvald.com/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded> <wfw:commentRss>http://blog.martinfjordvald.com/2010/09/12000-requests-per-second-with-nginx-php-and-memcached/feed/</wfw:commentRss> <slash:comments>41</slash:comments> <atom:link rel="payment" href="https://flattr.com/submit/auto?user_id=mfjordvald&amp;popout=1&amp;url=http%3A%2F%2Fblog.martinfjordvald.com%2F2010%2F09%2F12000-requests-per-second-with-nginx-php-and-memcached%2F&amp;language=en_GB&amp;category=text&amp;title=12%2C000+Requests+per+second+with+Nginx%2C+PHP+and+Memcached&amp;description=Edit%3A+Part+2+is+now+available.+This+is+the+first+entry+in+a+short+series+I%26%238217%3Bll+do+on+caching+in+PHP.+During+this+series+I%26%238217%3Bll+explore+some+of+the+options...&amp;tags=blog" type="text/html" /> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using memcached
Page Caching using memcached
Database Caching using memcached
Object Caching 2079/2205 objects using memcached

Served from: blog.martinfjordvald.com @ 2012-05-20 11:23:28 -->
