Friday, March 07, 2008

Why doesn't everyone use unicode?

Well, my promise to blog daily was short lived, but I'm back now. Wednesday's are going to be hard for me to blog because I teach a night class about 2 hours from where I live. But, I should have blogged on Thursday, no excuses there.

On to today's topic.

I spent multiple hours yesterday dealing with a character set encoding issue. As I've mentioned, our chess training site is multi-lingual. Therefore, we store all of our messages in UTF-8. Our database, MySQL, is set to use unicode as is our server side language, PHP. Our client side language, perl, is also set to use unicode. However, no matter how hard I tried, whenever I sent a message using red hot pawn via our software, the message got corrupted. I could manually copy and paste the message from our database viewer to the message sender and it would work fine, but when my software did it, the ä came out looking like à and some other symbol that I wasn't familiar with. Of course, having it print to my screen from perl caused it to come out looking like a third symbol Σ. Obviously, Internet Explorer, perl, and my windows console were all using different character sets. I found my setting for Internet Explorer. It was set to ISO-8859-1 (Latin 1). I did not find the setting for my windows command prompt. I assume it was using some Windows character set. I tried changing the Internet Explorer setting, but it didn't seem to have an effect. Finally, after a few hours of hunting and validating various settings, I checked the encoding on the red hot pawn page. It was set to ISO-8859-1. Ah ha! So, they were enforcing their own encoding on the page. Apparently, when I copy and paste, the operating system does the conversion in the background for me. However, when my program does it, I have to do the conversion myself. Because of a few limitations, using the perl Encode module was not an option for me. So, I settled on utf8::downgrade. This won't work if someone's desktop settings are not Latin-1, but is suits my needs at the moment. Why can't everyone just use unicode?

No comments: