Php and Utf-8

Php is a pain in the butt for dealing with different character sets. The language itself doesn't natively handle anything - you have to do several things to get a different charset to work. First of all you need the iconv or mbstring extensions. iconv is now built in by default but doesn't have very many functions, at least not the way mbstring does. the biggest thing missing from iconv is charset detection. It's really annoying.

So I wrote a String Class for phpfanfiction designed to try to make locale/charset stuff a little bit easier to handle. It's all staticy, so you don't have to worry about passing stuff around. Basically it has a select list of charsets that are available for native php, php with iconv, and php with mbstring. You use Pff_String::negotiate() to set up the charset/class to use. Negotiate allows you to set defaults and to override any settings. The class does try to set the locale, I haven't played with locales a whole lot so this may have bugs up the wazoo, but for the most part it's pretty nifty. Instead of using strlen, use Pff_String::strlen and it will automagically decide which string length to use. the whole point is to make multiple charset handling simple and efficient, without any brainwork.

but I'm still of the opinion that php should have better utf-8 (utf-16) handling in general. But this works, just make sure to set the right headers/meta tags on a page and ACCEPT-CHARSET attribute in forms to make the string handling nice and transparent.

But I will hope and wish for the day that charset handling is completely transparent in php.


Be the first to write a comment!

Post a Reply