Encoding hell, grep and iconv salvage!
Find non ASCII characters in file
grep --color='auto' -P "[\x80-\xFF]" FILENAME
It is used to extract records from dump, when database encoding is different with respect to the connection.
For example in MySQL the default field encoding is "latin1swedishci" and in browser it is usually the UTF8. We can work it out with iconv
iconv --verbose -f LATIN1 -t UTF8//TRANSLIT FILENAME_latin1 > FILENAME_utf8
If you get
iconv: illegal input sequence at position <NUMBER>
You may correct it with vim, just type in command mode
:goto <NUMBER>
Be aware that you're working with UTF8 locale session in terminal
user@host:~$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:
LC_CTYPE="en_US.UTF-8"
Thats it!
Written by Igor Moiseev
Related protips
Have a fresh tip? Share with Coderwall community!
Post
Post a tip
Best
#Grep
Authors
Sponsored by #native_company# — Learn More
#native_title#
#native_desc#