Last Updated: February 25, 2016

· chluehr

Debugging encodings and character sets.

#utf8

Garbled text on your screen?

Put your data in a plain text file (using vim - you do not want BOMs in your data!)
use the command hexdump -C file
locate the strange characters and determine the byte (sequences)
look them up, e.g. here: utf8 charset table (german)

An example, the german umlaut ü ("ue"):

Correct utf8 encoding is (you would see c3 bc in the hexdump):

U+00FC  ü  c3 bc   LATIN SMALL LETTER U WITH DIA.

A valid UTF-8 character sequence that displays identically, but is not a "ü" (again, 75 cc 88 in the hexdump):

U+0075  u   75      LATIN SMALL LETTER U
U+0308  ̈  cc 88   COMBINING DIAERESIS

Written by Christoph Lühr

Say Thanks

Respond

Related protips

Mac OS X: ValueError: unknown locale: UTF-8 in Python

179K

JDBC: Inserting unicode UTF-8 characters into MySQL

77.8K

Dealing with Unicode in Go

65.24K

Have a fresh tip? Share with Coderwall community!

Post

Post a tip

Best #Utf8 Authors

masnun

179K

#utf8

#PHP

#Python

moezzie

77.8K

65.24K

aleemb

50.55K

#utf8

#C#

#Open Source

vjt

15.54K

Related Tags

Awesome Job

Post a job for only $299

#native_title# #native_desc#

#native_cta#