I have a simple Python (2.7) script that reads a database table and spits out rows. Initially there was no need to use unicode, and the script was just this:
users = config.Session.query(User).order_by(User.id).all()
for _f in users:
print "{0:6d} {1:20} {2:30} {3:}".format(_f.id, _f.foo, _f.name, _f.url)
This worked fine and produced neatly formatted output like this:
739 42352 Foo Bar https://...
740 23555 Another User https://...
741 774577 Third User https://...
Then we started having accented names in the database. Initially the script started raising an exception about ascii codec not happy with things.
I attempted to fix the script, which I did, sort of. I got rid of the exception, but now every accented character in the name seems to count as double, causing the URL field to be N characters off, N being the number of accented characters in the name.
for _f in users:
uname = _f.name.encode('utf-8')
print "{0:6d} {1:20} {2:30} {3:}".format(_f.id, _f.foo, uname, _f.url)
And the output is now this:
739 42352 Foo Bar https://...
740 23555 Änöther User https://...
741 774577 Third User https://...
What do I need to add into my formatting string to make it count the length of an unicode string with accented characters correctly?
Printing byte strings with a multi-byte UTF-8 encoding is the issue. Don't encode it, use Unicode strings, eg print u"{0:6d}..."
.
Example:
print "1234567890"*3
print "{0:20} xxx".format(u"Another User")
print "{0:20} xxx".format(u"Änöther User".encode('utf8'))
print u"{0:20} xxx".format(u"Änöther User")
Output:
123456789012345678901234567890
Another User xxx
Änöther User xxx
Änöther User xxx
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.