Last Updated: February 25, 2016

·

1.249K

· penpen

Ugly unicode text

A couple of days ago, I worked with an ugly unicode text, kind of like this:</p>

ⒶⓀⓋⓉⒺ

b̼̘̬ͭ͂̈́̀l͇͉̱͚̲̗̗͞a̱̭̬͎͉̤ͨ͂̌̑̓͂͐h̬̯̻̩͗ͩͯḅ̢̬͕͈̥̅̌͆̔̉ͅḽ̘̖̼͚́͒̈́̏͌̃͟ ͎̮̫̍ͫ̽͐͋ͤ͂a̜͔̩͇̩̪͐̍̐̃ͤ͑ ̦̌ḧ̙̝͓̜͕̝̈́ͅb̛̞͔̽̃̍ͪla̘̠͖͍̣͙̝͌ͪ͒̃ͯ ͗͛̆͊.̛̭̜̞̲͓̯ͧ̅ĥ͂͑/̢̊/̠̘͖͖̖̺̯

</pre>

And I need to get a normal text from this shit, because MySQL has weird unicode support. So I made these simple functions for cleaning up unicode text:</p>


def stripaccents(s):
    """ Strip accents from a string """
    result = []
    for char in s:
        # Pass these symbols without processing
        if char in [u'й', u'Й', u'\n']: 
            result.append(char)
            continue
        for c in unicodedata.normalize('NFD', char):
            if unicodedata.category(c) == 'Mn':
                continue
            result.append(c)
    return ''.join(result)
</code></pre>

def stripsymbols(s):
    """ Strip ugly unicode symbols from a string """
    result = []
    for char in s:
        # Pass these symbols without processing
        if char in [u'й', u'Й', u'\n']:
            result.append(char)
            continue
        for c in unicodedata.normalize('NFKC', char):
            if unicodedata.category(c) == 'Zs':
                result.append(u' ')
                continue
            if unicodedata.category(c) not in ['So', 'Mn', 
                             'Lo', 'Cn', 'Co', 'Cf', 'Cc']:
                result.append(c)
    return u"".join(result)
</code>
</pre>

Written by Roman Koblov

Related protips

Flatten a list of lists in one line in Python

342.4K

14

Remote Access to IPython Notebooks via SSH

304.2K

24

Emulate do-while loop in Python

253.9K

5

Have a fresh tip? Share with Coderwall community!

Best #Python Authors

341.7K

303.7K

projectcleverweb

291.6K

253.8K

207.2K

Related Tags

#native_company#

Awesome Job

Post a job for only $299

Thanks to our sponsor

#native_title# #native_desc#