Debugging encodings and character sets.
Garbled text on your screen?
- Put your data in a plain text file (using vim - you do not want BOMs in your data!)
- use the command hexdump -C file
- locate the strange characters and determine the byte (sequences)
- look them up, e.g. here: utf8 charset table (german)
An example, the german umlaut ü ("ue"):
Correct utf8 encoding is (you would see c3 bc in the hexdump):
U+00FC ü c3 bc LATIN SMALL LETTER U WITH DIA.
A valid UTF-8 character sequence that displays identically, but is not a "ü" (again, 75 cc 88 in the hexdump):
U+0075 u 75 LATIN SMALL LETTER U
U+0308 ̈ cc 88 COMBINING DIAERESIS
Written by Christoph Lühr
Related protips
Have a fresh tip? Share with Coderwall community!
Post
Post a tip
Best
#Utf8
Authors
Sponsored by #native_company# — Learn More
#native_title#
#native_desc#