Lately I’ve been working with a friend on a daily-deal aggregator. The Groupon-like sites are popping up everywhere and the market for aggregators is still fairly unfilled. My project, Alladeals, target the Swedish daily deals market and as such it needs to support Swedish characters. In future it might have to support other languages as well so I decided that UTF8 was the way to go. Since most webpages are encoded in UTF-8 these days it has been fairly painless to actually work with UTF-8 in PHP, that is, until yesterday.
PHP does not natively support UTF-8. This is fairly important to keep in mind when dealing with UTF-8 encoded data in PHP. Usually I’m pretty good at remembering that, however yesterday I happened upon a bug which could easily have gone unnoticed for months if not for some good luck.
The bug manifested itself in the deal titles, the design is not well suited for really long titles so it was decided that it would be best to make sure that the titles did not exceed a length of 140 characters. To cut the the title the following code was used: