Using format to fill and justify multi-byte Unicode strings in Python 2.7 -
in python, it's easy fill (i.e. pad) strings , justify them left, right, or center using string.format
. example:
>>> word = "resume" >>> print "123456890\n{0:>{1}}".format(word, 10) >>> print len(name) 1234567890 resume 6
however, if string contains multi-byte unicode characters, string.format
doesn't calculate string's width correctly:
>>> word = u"résumé" >>> print "123456890\n{0:>{1}}".format(word.encode('utf8'), 10) >>> print len(name.encode('utf8')) 1234567890 résumé 8
the solution not use unicodedata.normalize('nfc', string)
, may have read. indeed normalize unicode character sequences (and may necessary in cases!) not cause string.format
calculate encoded width of strings output terminal.
so how 1 print correctly filled/padded strings string.format
in python 2.7?
the answer, turns out, dead simple: use unicode literal format strings:
>>> word = u"résumé" >>> print u"123456890\n{0:>{1}}".format(word, 10) >>> print len(name) 1234567890 résumé 6
this one-character solution seems hidden in message victor stinner on python bug tracker:
oh way, it's trivial workaround issue in python 2: use unicode format string. example, replace
'{0}'.format(u'\u3042')
u'{0}'.format(u'\u3042')
.
i haven't found in stackoverflow answers, or on pages found on google, whether blogs, forums, mailing lists, etc. here is!
Comments
Post a Comment