Last Updated: September 30, 2021
·
5.312K
· khaos

Proper MySQL character set and collation setup

We use Unicode and, specifally, UTF-8 everywhere possible nowadays, especially on the Web. Especially when it comes to DBMS encodings with non-latin languages (like russian for example). Wrong DBMS-storage/DBMS-connection/execution engine encoding setup not only can lead to a broken character display, but can cause far more deeper and less obvious issues further in the future.

Most of web applications (or other kinds of serving programs) are capable of setting proper connection encoding on their own (like SET NAMES 'utf8' and etc), so global DBMS encoding settings just have no effect. But some CMS (specifally, MODX, as it seems) do not use that capability or doesn't want to for some unclear reason. This is where database management system's settings show up.

Every version of MySQL binaries comes with precompiled built-in charset set to latin1 and collation latin1_swedish_ci, which just ruins everything that is non-latin. This situation can be resolved either by hack-patching the application's core (which can get quite tricky very fast) or by setting appropriate global defaults. And that's how the last is done for MySQL: in MySQL configuration file (which usually resides in /etc/mysql/my.cnf) add the following lines anywhere under [mysqld] section:

character-set-server = utf8
collation-server = utf8_general_ci
skip-character-set-client-handshake

The last line (skip-character-set-client-handshake) skips the requirement of executing SET NAMES/SET CHARACTER SET directives upon connection and makes all three default encoding settings character_set_client, character_set_database and character_set_connection consistent.

2 Responses
Add your response

You smashed it George

over 1 year ago ·

In some cases, for better character matching and sorting (eg. for umlauts, accented letters, letter variants etc) I'd suggest using utf8_unicode_ci collation.

over 1 year ago ·