Last Updated: March 07, 2016
·
3.034K
· moiseevigor

Encoding hell, grep and iconv salvage!

Find non ASCII characters in file

grep --color='auto' -P "[\x80-\xFF]" FILENAME

It is used to extract records from dump, when database encoding is different with respect to the connection.

For example in MySQL the default field encoding is "latin1swedishci" and in browser it is usually the UTF8. We can work it out with iconv

iconv --verbose -f LATIN1 -t UTF8//TRANSLIT FILENAME_latin1 > FILENAME_utf8

If you get

iconv: illegal input sequence at position <NUMBER>

You may correct it with vim, just type in command mode

:goto <NUMBER>

Be aware that you're working with UTF8 locale session in terminal

user@host:~$ locale 
LANG=en_US.UTF-8
LANGUAGE=en_US:
LC_CTYPE="en_US.UTF-8"

Thats it!