Edit: Part 2 is now available.
This is the first entry in a short series I’ll do on caching in PHP.
During this series I’ll explore some of the options that exist when caching PHP code and provide a unique (I think) solution that I feel works well to gain high performance without sacrificing real-time data.
Caching in PHP is usually done on a per-object basis, people will cache a query or some CPU intensive calculations to prevent redoing these CPU intensive operations. This can get you a long way. I have an old site which uses this method and gets 105 requests per second on really old hardware.
An alternative that is used, for example in the Super Cache WordPress plug-in, is to cache the full-page data. This essentially mean that you create a page only once. This introduces the problem of stale data which people usually solve by checking whether data is still valid or by using a TTL caching mechanism and accepting stale data.
The method I propose is a spin on full-page caching. I’m a big fan of nginx and I tend to use it to solve a lot of my problems, this case is no exception. Nginx has a built-in Memcached module, with this we can store a page in Memcached and have nginx serve it – thus never touching PHP at all. This essentially turns this:
Concurrency Level: 50 Time taken for tests: 2.443 seconds Complete requests: 5000 Failed requests: 0 Write errors: 0 Total transferred: 11020000 bytes HTML transferred: 10210000 bytes Requests per second: 2046.32 [#/sec] (mean) Time per request: 24.434 [ms] (mean) Time per request: 0.489 [ms] (mean, across all concurrent requests) Transfer rate: 4404.39 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.1 0 2 Processing: 6 22 19.7 20 225 Waiting: 5 20 2.6 20 40 Total: 6 22 19.7 20 225 Percentage of the requests served within a certain time (ms) 50% 20 66% 21 75% 22 80% 22 90% 24 95% 26 98% 29 99% 39 100% 225 (longest request)
Into this
Concurrency Level: 50 Time taken for tests: 0.414 seconds Complete requests: 5000 Failed requests: 0 Write errors: 0 Total transferred: 11024350 bytes HTML transferred: 10227760 bytes Requests per second: 12065.00 [#/sec] (mean) Time per request: 4.144 [ms] (mean) Time per request: 0.083 [ms] (mean, across all concurrent requests) Transfer rate: 25978.27 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 1 0.1 1 2 Processing: 1 3 0.3 3 5 Waiting: 1 1 0.3 1 4 Total: 2 4 0.3 4 7 Percentage of the requests served within a certain time (ms) 50% 4 66% 4 75% 4 80% 4 90% 4 95% 4 98% 5 99% 5 100% 7 (longest request)
What’s important to note here is how these figures will scale. To get these numbers I developed a very simple proof-of-concept news script, all it does is fetch and show data from two MySQL tables: news and comments. A more complicated application might result in only 100 requests per second or if something like WordPress or Magento as low 20 requests per second! The good thing is that with full-page caching the time required to fetch and display the data depends only on the size of the cached data. Therefore if your application is written to do full-page caching it will always be able to enjoy low latency and high concurrency.
The Complications
Full-page caching does introduce some complications, though. As mentioned earlier the goal is to make nginx serve the cached pages, as such we cannot perform any logic during the serving of the page. This means we need to handle invalidation of cached pages during the updating of the data they use.
To be able to invalidate pages it’s important that we understand what data we have to work with and how it relates to not only our pages, but also our code. We will be using a framework so we can create a few rules that will help us understand the whole system.
- The framework uses a three-tiered setup of controllers, libraries and templates.
- Controllers will dictate how to handle a request defined by the URI.
- Libraries will be used to access all data.
This is how most frameworks work, you have a few of the big ones which use a MVC pattern but such a setup will be largely the same. From these rules we can determine how the relationship between data, controllers and pages will be.
- All data will need an identifier. For instance if you have a news script you’ll need an identifier for “news” and “comments”.
- All controllers must specify which data they use by referencing the identifier.
So to recap. The goal is to invalidate the correct pages, to do this we need to know which pages use what data. gives us 3 important parts.
- The library that handles the editing of data, and therefore the invalidation triggering.
- The controller handles the requests based on the URI and therefore relates to the cached pages.
- The actual cached pages.
Finally, we’re unlikely to have only one of each, for instance often multiple controllers will be using data. To continue our news script example, we have a controller to fetch the news and a controller to generate a RSS feed of the news. Similarly a controller might generate multiple pages, for instance one page per news post to display the comments. Therefore we also need to consider the inter-data-relationships.
- One-to-many relationship between invalidated data and controllers.
- One-to-many relationship between controllers and pages.
Data & Controllers
Earlier we defined a rule that all controllers much specify which data they use. This is useful as it means we can create a dependency list between data and controllers. When data is invalidated we can do a lookup in the dependency list and see which controllers we need to tell about the invalidated data.
This solves the problem elegantly and with OOP we can define interfaces to force controllers to implement the required methods. If they don’t we can set a flag that prevents the data from being cached and they should work normally.
One possible downside to this is that you can no longer edit files on the fly. If you change the way data is used you will most likely need to regenerate the dependency list, therefore it becomes critical that you have a deployment process in place for all code changes. Personally I think this is required any way so it does not cause me any problems, however it is something that has to be considered.
Controllers & Pages
Websites are per their nature diverse, in this framework all requests are passed to a controller along with the URI. The controller then uses the URI to determine what data to use to generate the output. The problem here is that there is a huge range of options on how the controller might look and behave. It would be really difficult to define something like a dependency list as a controller might use multiple data sources which will update dynamically. This would require the dependency list to be updated every time new data was added, not really a feasible solution.
The easy scenario is where the page URI is directly related to the data. For example in our news script the URI /news/4/ might show the news post with ID 4. If a comment is added to this news post we trigger an invalidation on the comments data identifier. The library that inserts the data will know to insert to news post 4, therefore it can also pass this along when triggering the invalidation. This allows the controller to determine that the page /news/4/ needs to be invalidated.
The bigger problem is when data is used as part of a set defined by data not related to the updated data. A simple example here would be a search function. You have the controller search and the keyword “PHP” being searched for – the URI for this would most likely be /search/PHP/. When a news post is updated we pass along the ID to the controller but we have no way to determine which URI actually uses said news post. Keeping track of each search term is not feasible. There are a few options here but none that are really perfect.
- Don’t cache at all, data will always be current but might be CPU intensive.
- Increase caching granularity. Pass each request to PHP but cache the IDs of the news post and fetch the current data.
- Cache the full page using a time-to-live value. This means we have stale data for a bit but we keep high performance.
Ultimately it depends on your situation and what will fit best. I’d imagine I’d most often choose TTL caching or in case I need current data then increased caching granularity.
This covers the overall system, next time I’ll talk about how I’ve chosen to implemented this.
50 Comments. Leave new
Please tell me, were those test run from the same server that nginx was running or was it run from a different pc in a different network? Cause I’ll be happy just getting the stats you got before you switched to using the nginx cache.
We also use memcached for our pages and this is the stats I’m getting for a page that’s fully cached, we use apache 2.1 and php 5.1
Stats when I run ab on a different datacenter
Concurrency Level: 50
Time taken for tests: 424.595 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Total transferred: 1882973672 bytes
HTML transferred: 1868221017 bytes
Requests per second: 117.76 [#/sec] (mean)
Time per request: 424.595 [ms] (mean)
Time per request: 8.492 [ms] (mean, across all concurrent requests)
Transfer rate: 4330.81 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 78 82 3.4 80 104
Processing: 318 342 47.3 340 1626
Waiting: 81 91 47.9 87 1384
Total: 396 424 47.1 422 1705
Percentage of the requests served within a certain time (ms)
50% 422
66% 423
75% 424
80% 425
90% 427
95% 428
98% 432
99% 435
100% 1705 (longest request)
Stats when run on the web server
Concurrency Level: 50
Time taken for tests: 12.615 seconds
Complete requests: 50000
Failed requests: 0
Write errors: 0
Total transferred: 1882737406 bytes
HTML transferred: 1867987111 bytes
Requests per second: 3963.61 [#/sec] (mean)
Time per request: 12.615 [ms] (mean)
Time per request: 0.252 [ms] (mean, across all concurrent requests)
Transfer rate: 145750.81 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 0.8 1 6
Processing: 3 12 2.9 11 27
Waiting: 2 9 4.1 7 24
Total: 6 13 2.4 12 28
Percentage of the requests served within a certain time (ms)
50% 12
66% 13
75% 14
80% 15
90% 16
95% 17
98% 18
99% 19
100% 28 (longest request)
I did run these tests on the same machine as the code was running on as I did not want the network to factor into the equation. When I ran it from another server in the same LAN segment it was slightly lower due to the network overhead, but it was very close.
When you run AB from a completely external location you’re essentially benchmarking how much data you can move between the two locations, not how much data the server can generate. In a real world scenario where you have this much traffic you won’t have just one connection going but many, many thousands. This means that your bottleneck is not likely to be the bandwidth between location A and B.
Was this WordPress? Because getting 2046.32 request a second without caching or optimization on WordPress from apache bench is a damn miracle. If so, would love to see your nginx config.
No. It was a pretty generic news posting script. I had some networked database interaction but nothing overly complicated. WordPress gives me around 200 requests/sec using the standard W3 Total Cache plugin with memcached caching. I could make it use static files but even with all the Traffic ycombinator gave me my blog seems to have held up just fine.
Hi,
What are the spec’s of the server?
Thanks,
Ian
It’s a dedicated box with a i7 860 @ 2.8GHz CPU and 6 GB of RAM.
[…] Read more here Posted in Uncategorized , interesting, science, tech | No Comments » […]
Why you don’t use SSI (Server Side Include) ? It can be a good complement.
SSI is a bit simple in it’s support. Nginx doesn’t really have the required logic to use SSI as a viable caching strategy. I intend to explore edge side includes in an eventual 3rd part of this series. Will probably use Varnish unless Nginx has gotten some better ESI support by then.
http://kovyrin.net/2007/08/05/using-nginx-ssi-and-memcache-to-make-your-web-applications-faster/
The problem with that approach is that you don’t gain anything with it. You tell Nginx to include a file, that’s totally fine, but you still need something to process it, you can proxy pass or fastcgi or whatever, but *something* needs to generate it. If you do decide to pass it to PHP then you’re already slowed down loads as simply having PHP echo something like “hi” is rather slow.
Nginx doesn’t only include a file, he makes the subrequest like another normal request. You can cache them in memcached too.
One page = multiple cached parts
Delete one key, doesn’t affect others.
Certainly. But you’ll end up doing multiple cache gets in one request. That’s not necessarily bad if it allows you to cache a page you normally wouldn’t be able to cache, but in the examples used in the article you linked he uses it to include a login page. I see absolutely no point in that as there’s no dynamic content there. SSI (or ESI) is definitely a concept I want to explore further, but one I’m going to be careful about.
welcome to 2007. glad you made it 😉
(still, nice article!)
Hah, yeah I realize the concept is known, but I don’t actually know of any popular PHP framework which centres around full page caching with smart invalidation. Usually they provide methods to cache based on TTL, but that leaves to stale data and is less than optimal.
btw. you turned off keep alive right?
No. Keep alive is turned on in Nginx, Nginx handles TIME_WAIT connections really well so I see no reason to not have them on, I would in any real world case. Furthermore, I also had keep-alive on connections between Nginx and Memcached. I’ll detail this in the next part.
As far as I remember I did not use the -k switch, though.
I wish I had some more information, like the whole nginx config. I have an Webbynode VPS with 4 cores availble and I couldn’t make it faster than 6k req/second with static files. It has very low memory, but this process is basically CPU bound as I observed.
I will expand on the entire setup in part 2. Both on how the actual PHP implemention and Nginx implemention are handled.
Did you use -k on these tests? With 10,000 requests across 50 concurrent users I can’t get much over 10,000 requests/second unless I use -k. The it becomes around 13,000-14,000 requests/second.
As far as I remember I did not use the -k flag in ab. I did however use keep-alive between Nginx and Memcached, which did increase requests per second some.
What command did you run AB with?
You can sort of tell by AB output, but something like:
ab -c 50 -n 5000 http://url.com/
Also, with full page caching, how would you propose handling things that are genuinely dynamic, like login boxes, forum posts, etc?
You obviously cannot cache a POST request as you need to take some sort of action. You can always cache things that are fully dynamic, the difficulty is how to figure out when to invalidate the cache when the data is updated. The more complex your application is the more complex the invalidation logic becomes. The framework I use has some methods for keeping track of it but it obviously becomes complex over time. I’ll provide more details in part 2.
Finally, what about specs of the server etc?
I’m running on a small linode VPS and I get ~2000req/sec for a script that simply echoes “hi”
It’s a dedicated box with a i7 860 @ 2.8GHz CPU and 6 GB of RAM.
I wonder what the number would look like if you ran
:%s/nginx/lighttpd
I honestly cannot say. I used Lighttpd before Nginx but back then the memory leaks were so bad it was useless. Today I just don’t see anything that would entice me to switch back.
do you have information on how you configured your nginx/php/memcached. Did you do anything special for the configurations? Considerations on bypassing apache and serving php directly?
I’m working on a follow up blog post right now and will prove all the details there plus a working framework that has an implementation of the smart invalidation.
As for bypassing Apache. I’ve actually been doing that for 2 years now, PHP-FPM is extremely stable and extremely awesome so there’s absolutely no need for Apache for me, it’d just be another layer of complexity at no benefit.
Thanks for your responses. Have you tried with HTTPerf? I seem to get much higher results with it, not sure why. For my scripts that are using memcached at nginx level, on my Linode 512 (so a long way off of a dedicated i7!), I get something like:
james@li140-209:~$ httperf –hog –num-conns 10000 –num-calls 10000 –burst-length 20 –port 80 –rate 10000 –server 0xf.nl –uri=/
httperf –hog –client=0/1 –server=0xf.nl –port=80 –uri=/ –rate=10000 –send-buffer=4096 –recv-buffer=16384 –num-conns=10000 –num-calls=10000 –burst-length=20
Maximum connect burst length: 2824
Total: connections 2546 requests 47244 replies 1981 test-duration 1.316 s
Connection rate: 1934.7 conn/s (0.5 ms/conn, <=1022 concurrent connections)
Connection time [ms]: min 0.9 avg 546.8 max 792.8 median 575.5 stddev 126.6
Connection time [ms]: connect 156.6
Connection length [replies/conn]: 1.000
Request rate: 35901.2 req/s (0.0 ms/req)
Request size [B]: 59.0
Reply rate [replies/s]: min 0.0 avg 0.0 max 0.0 stddev 0.0 (0 samples)
Reply time [ms]: response 259.7 transfer 0.0
Reply size [B]: header 143.0 content 2066.0 footer 0.0 (total 2209.0)
Reply status: 1xx=0 2xx=1981 3xx=0 4xx=0 5xx=0
CPU time [s]: user 0.12 system 1.12 (user 9.1% system 84.9% total 94.1%)
Net I/O: 5316.0 KB/s (43.5*10^6 bps)
Errors: total 10000 client-timo 0 socket-timo 0 connrefused 0 connreset 2546
Errors: fd-unavail 7454 addrunavail 0 ftab-full 0 other 0
james@li140-209:~$
Great Blog! do you have a twitter account?
Thank you, I do not have a twitter account, though.
Great Post, but when will you publish the one about the implementation?
Got the code all tidied up and packaged, so just have to write the actual blog post now. So probably tomorrow.
Ok, thank you a lot, this study is what I was looking for.
[…] Feb.18, 2011 under Memcached, Nginx, PHP, PerformanceThis is part two in my caching series. Part one covered the concept behind the full page caching as well as potential problems to keep in mind. […]
Great article. I’m creating hi-performance API server by using PHP and Apache. The server is CentOS placed on Amazon EC2 platform. For static .html files I get 6000 req/sec (50 concurrent) without any optimization but when I execute simple echo the number drops to 3000 req/sec. When I put big comment inside I get 2200 req/sec and I use simple “include” with small file I get 1200 req/sec. Our PHP RESTful API application gets only 100 req/sec.
Can someone explain why is this happening and how can we increase the requests per second for our application? Will Nginx help? How to create a hi-performance API web server?
Thanks for you replies in advance.
[…] 12,000 Requests per second with Nginx, PHP and Memcached […]
[…] request and takes around 0.158 seconds to completeCool eh? Some have even reported speeds of up to 12,000 requests per second with Nginx.ConclusionWas it worth it? Yes! I put off this long needed update because I knew I had to get my […]
@Goran – “creating hi-performance API server by using PHP and Apache” – this is a contradiction in terms. Apache is anything but hi in terms of performance.
Have you ever heard of Nginx?
@Ryan, yes I’m mentioning nginx in the above post. Today we’re using nginx to serve static files but it never became part of our API server configuration since we’re using small EC2 instances which have moderate network I/O and the network is bottleneck. We’ve achieved hi performance with several DNS roundrobin loadbalancers and memcached.
I too tried out to achieve 12000 requests but not success. AB tool shows that 12000 requests has reached but same time PHP get crashed. Run particular web page from browser after running AB, it says bad gateway error.
http://stackoverflow.com/questions/3616191/nginx-php-fpm-502-bad-gateway
You have a segmentation fault in your PHP, that has nothing to do with caching PHP pages in memcached… You’re looking in the way wrong location.
This is the usual nonsens spouted regarding hypothetical situations let’s take any skate that’s contains any degree of functionality and this falls apart what happens regarding authentication and dynamic page elements?
All you are saying is nginx can hypothetically serve 12000 requests for static content
What about the race condition when it’s dropped from cage what about cold startups what abou handling authentication and personalisation
Hey Isobelle,
This was not so much an exercise in pushing nginx, nginx can serve far more than 12000 requests when tuned correctly. What I was trying to achieve here was a way to cache dynamic pages without touching the backend unless the cached data becomes dirty. So this is not so much “Wow nginx can handle 12k connections per second!” but more about what kind of logic we need in place in order to allow dynamic pages to be fully cached and properly invalidated when the data updates.
You are right that I don’t address how to deal with dynamic page elements. I wanted to have 2 follow up articles to this post, one is the Implementing full page caching article already on my blog and the other never got written as I lost interest in this topic and not many else seemed to care either.
How your PHP (https://github.com/mfjordvald/Evil-Genius-Framework) have handled more than 2K req. per sec? on that device – i7 860 @ 2.8GHz CPU and 6 GB of RAM.
The PHP itself probably cannot. The framework is a (very old) proof-of-concept for doing full page caching of non-user pages while being able to invalidate cache entries intelligently to avoid caching data for n minutes and having stale data for x minutes.
So the high request count comes through caching the full page in nginx as I describe in part 2: https://blog.martinfjordvald.com/2011/02/implementing-full-page-caching-with-nginx-and-php/
Hi Martin,
I am just starting in high level HA and scaling with Nginx. I am finding your posts here super valuable. We are preparing to add dynamic community to our WP site (buddypress) and I am scared to death of the hit our servers will take when we can’t lean on static page cache (varnish) like we have in the past to offload traffic. Currently looking at ways to not necessarily cache dynamic content, but rather reduce user latency by compressing the amount of data needed to transfer to the network edge. Looking very closely at cloudflare railgun and ESI (to a lessor extent).
You mention in the above comment that at one point you had planned on doing a part 3 post to this series discussing some of these concepts related to dealing with dynamic page elements. I’d like to put in my vote to request when you find the time to add that post, or at the very least if you could shed some insight or other resources to check out here, that would be great as well.
Thanks.