Archive for December, 2008

Think Big

Monday, December 1st, 2008

After having recently audited a web application for proper character encoding support, I have one piece of advice for web developers in this area and that is: you should always support UTF-8 encoding across the board right from the start.

If you just take a few steps at the beginning of a project to enable Unicode encoding, you won’t ever have to worry about international character support, your application is “future-proofed.” When the time comes for your application to go global, it’s ready to go.

For example, if you use PHP you can set the encoding with the following statement in your php.ini file:
default_charset = "utf-8"

Or with Java set the following jvm setting:
-Dfile.encoding=UTF-8

In your Apache config file, set:
AddCharset UTF-8 .utf8
AddDefaultCharset UTF-8

You also need to set the database encoding to match up. In MySQL you can do this by setting the following directive in your my.cnf file:
default-character-set=utf8

It is important to note that you also need to specify the encoding for the connection. For a JDBC connection you can do this by adding the following parameters to the database URL:
useUnicode=true&characterEncoding=UTF-8&characterSetResults=UTF-8

Finally, if you are going to be sending emails from an application, you will want to specify the encoding in the Mime type as well. Using JavaMail, this can be done while setting the text of a MimeBodyPart:
mimeBodyPart.setText(htmlContent, MimeUtility.getDefaultJavaCharset(), "html");

If you make a concerted effort to implement the same encoding across the board, it will certainly help in the long run, as you won’t ever have to deal with character encoding issues.

For more information and specifics about UTF-8 encoding, visit UTF-8: The Secret of Character Encoding