Last Updated: September 09, 2019
·
1.86K
· oliver-gaspar

A checklist for avoiding issues with international characters in PHP & MySQL

Creating a MySQL powered websites, applications or content management systems often involves dealing with unpleasant character encoding related issues that are hard (and so not fun) to diagnose. In the worst case the behavior may even vary between your development and production environment.

I’ve never been digging deep enough into character encoding (since it’s not fun), but I plan to reading this promising blog post recommended somewhere at stackoverflow: Getting out of MySQL Character Set Hell

Before I get around to doing it (it’s quite long and detailed), here is a quick list of rules to follow I’ve came up with after a lot of trial and error. I might update it after I read the article.

The checklist

  • The database’s collation must be set to utf8_general_ci (or anything more language-specific, utf8_language_ci)
  • Every column in every table must have the same collation
  • Connection to the database included in every PHP page should be followed with this MySQL query: SET NAMES 'utf8' COLLATE 'utf8_general_ci'
  • PHP must provide a HTTP header before any output, which specifies encoding: header('Content-type: text/html; charset=utf-8');
  • The HTML header must specify encoding: <meta charset="UTF-8">
  • Every PHP file displaying output must be saved with character encoding set to UTF-8 without BOM

There is one special case that needs to be kept in mind:

  • Whenever unsing PHP’s htmlentities() function, you must specify UTF-8 encoding

The source

Just in case anyone is wondering, the stackoverflow question that brought me to the blog post is here, and the user adrienne, author of the best answer, lists these rules:

  • The DB connection is using UTF-8
  • The DB tables are using UTF-8
  • The individual columns in the DB tables are using UTF-8
  • The data is actually stored properly in the UTF-8 encoding inside the database (often not the case if you’ve imported from bad sources, or changed table or column collations)
  • The web page is requesting UTF-8
  • Apache is serving UTF-8

2 Responses
Add your response

I've never needed string manipulations that would require a separate library. Or are you implying that the built-in PHP string functions won't work here, for some reason?

over 1 year ago ·

Instead of "SET NAMES..." we should use "mysqlisetcharset" (http://www.php.net/manual/de/mysqli.set-charset.php) now?!

"This is the preferred way to change the charset. Using mysqli_query() to set it (such
as SET NAMES utf8) is not recommended. See the MySQL character set concepts section for more information."

over 1 year ago ·