Using format to fill and justify multi-byte Unicode strings in Python 2.7 -

- June 15, 2012

in python, it's easy fill (i.e. pad) strings , justify them left, right, or center using string.format. example:

>>> word = "resume" >>> print "123456890\n{0:>{1}}".format(word, 10) >>> print len(name)  1234567890     resume 6

however, if string contains multi-byte unicode characters, string.format doesn't calculate string's width correctly:

>>> word = u"résumé" >>> print "123456890\n{0:>{1}}".format(word.encode('utf8'), 10) >>> print len(name.encode('utf8'))  1234567890   résumé 8

the solution not use unicodedata.normalize('nfc', string), may have read. indeed normalize unicode character sequences (and may necessary in cases!) not cause string.format calculate encoded width of strings output terminal.

so how 1 print correctly filled/padded strings string.format in python 2.7?

the answer, turns out, dead simple: use unicode literal format strings:

>>> word = u"résumé" >>> print u"123456890\n{0:>{1}}".format(word, 10) >>> print len(name)  1234567890     résumé 6

this one-character solution seems hidden in message victor stinner on python bug tracker:

oh way, it's trivial workaround issue in python 2: use unicode format string. example, replace '{0}'.format(u'\u3042') u'{0}'.format(u'\u3042').

i haven't found in stackoverflow answers, or on pages found on google, whether blogs, forums, mailing lists, etc. here is!

Search This Blog

WIKI

Using format to fill and justify multi-byte Unicode strings in Python 2.7 -

Comments

Post a Comment

Popular posts from this blog

jquery - ReferenceError: CKEDITOR is not defined -

javascript - Chart.js (Radar Chart) different scaleLineColor for each scaleLine -

apache - Error with PHP mail(): Multiple or malformed newlines found in additional_header -