Proper MySQL character set and collation setup
We use Unicode and, specifally, UTF-8 everywhere possible nowadays, especially on the Web. Especially when it comes to DBMS encodings with non-latin languages (like russian for example). Wrong DBMS-storage/DBMS-connection/execution engine encoding setup not only can lead to a broken character display, but can cause far more deeper and less obvious issues further in the future.
Most of web applications (or other kinds of serving programs) are capable of setting proper connection encoding on their own (like SET NAMES 'utf8'
and etc), so global DBMS encoding settings just have no effect. But some CMS (specifally, MODX, as it seems) do not use that capability or doesn't want to for some unclear reason. This is where database management system's settings show up.
Every version of MySQL binaries comes with precompiled built-in charset set to latin1
and collation latin1_swedish_ci
, which just ruins everything that is non-latin. This situation can be resolved either by hack-patching the application's core (which can get quite tricky very fast) or by setting appropriate global defaults. And that's how the last is done for MySQL: in MySQL configuration file (which usually resides in /etc/mysql/my.cnf
) add the following lines anywhere under [mysqld]
section:
character-set-server = utf8
collation-server = utf8_general_ci
skip-character-set-client-handshake
The last line (skip-character-set-client-handshake
) skips the requirement of executing SET NAMES
/SET CHARACTER SET
directives upon connection and makes all three default encoding settings character_set_client
, character_set_database
and character_set_connection
consistent.
Written by George Zelensky
Related protips
2 Responses
You smashed it George
In some cases, for better character matching and sorting (eg. for umlauts, accented letters, letter variants etc) I'd suggest using utf8_unicode_ci
collation.