It is possible that converting MySQL dataset from one encoding to another can result in garbled data, for example when converting from Latin1 to UTF8. There are a number of approaches. All examples assume we are converting the
title VARCHAR(255) column in the
#1 Convert to blob then UTF8
ALTER TABLE comments MODIFY title BLOB; ALTER TABLE comments MODIFY title VARCHAR(255) CHARACTER SET utf8;
The same Percona link also has more approaches on how to do this without locking the tables.
#2 convert to BINARY then UTF8
Sidecar has a nice simple 1-liner both to test the results and convert it and a little Ruby script to automate over the entire DB. The test snippet is particularly useful. This doesn't change the column (no
ALTER TABLE), instead it just converts the data in place.
-- test SELECT CONVERT(CAST(CONVERT(title USING latin1) AS BINARY) USING utf8) FROM comments WHERE id = 123; -- convert UPDATE comments SET title = CONVERT(cast(CONVERT(title USING latin1) AS BINARY) USING utf8); ALTER TABLE comments MODIFY title VARCHAR(255) CHARACTER SET utf8;
Another other approach is to use the
iconv command to convert the an entire table:
mysqldump comments --add-drop-table users | replace CHARSET=latin1 CHARSET=utf8 | iconv -f latin1 -t utf8 | mysql some_database
Additionally Percona offers a tool to convert the character-set for an entire DB as discussed in their converting characterset post.
#4: Safest approach without double conversion (PREFERRED METHOD)
It is possible that you are still ending up with garbled data or something is going wrong. This might happen for example if you have stored
UTF8 data in a
LATIN1 column or some other such scenario. Joni Salonen offers a check along with the update.
UPDATE comments SET title = @txt WHERE char_length(title) = LENGTH(@txt := CONVERT(BINARY CONVERT(title USING latin1) USING utf8)); ALTER TABLE comments MODIFY title VARCHAR(255) CHARACTER SET utf8;
If you feel like digging deeper, you can use the
HEX function to test the output, example:
SELECT HEX(CONVERT(`title` USING latin1)) FROM `comments` WHERE id=123; SELECT HEX(CONVERT(CAST(CONVERT(title USING latin1) AS BINARY) USING utf8)) FROM comments WHERE id = 123;