简体   繁体   中英

Print special characters in list in Python

I have a list containing special characters (for example é or a white space) and when I print the list these characters are printed with their Unicode code, while they are printed correctly if I print the list elements separately:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

my_list = ['éléphant', 'Hello World']
print(my_list)
print(my_list[0])
print(my_list[1])

The output of this code is

['\\xc3\\xa9l\\xc3\\xa9phant', 'Hello World']

éléphant

Hello World

And I would like to have ['éléphant', 'Hello World'] for the first output. What should I change?

If possible, switch to Python 3 and you'll get the expected result.

If you have to make it work in Python 2, then use unicode strings:

my_list = [u'éléphant', u'Hello World']

The way you have it right now, Python is interpreting the first string as a series of bytes with values '\\xc3\\xa9l\\xc3\\xa9phant' which will only be converted to Unicode code points after properly UTF-8 decoded: '\\xc3\\xa9l\\xc3\\xa9phant'.decode('utf8') == u'\\xe9l\\xe9phant' .

If you wish to print list repr and get "unicode" out, you'll have to manually encode it as UTF-8 (if that's what your terminal understands).

>>> print repr(my_list).decode('unicode-escape').encode('utf8')
[u'éléphant', u'Hello World']

But it's easier to format it manually:

>>> print ", ".join(my_list)
éléphant, Hello World

Short answer, you have to implement it yourself, if you want to keep the output in that format:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

my_list = ['éléphant', 'Hello World']

def print_list (l):
    print ("[" + ", ".join(["'%s'" % str(x) for x in l]) + "]")

print_list (my_list)

Which generates the expected

['éléphant', 'Hello World']

However, note that it would put all elements inside quotes (even numbers, for example), so you may need a more complex implementation, if you're expecting anything other than strings on your list.

Longer answer

The problem is that Python runs str(my_list) under the hoods, before printing it. And that, in turn, runs repr() on each of the list's elements.

Now, repr() on a string returns an ASCII-only representation of the string. That is, those '\\xc3' you're seeing are an actual backslash, an actual 'c' and an actual '3' characters.

You can't work around that, as the problem is on the implementation of list.__str__ () .

Below, a sample program to demonstrate that.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# vi: ai sts=4 sw=4 et

import pprint

my_list = ['éléphant', 'Hello World']

# under the hood, python first runs str(my_list), before printing it
my_list_as_string = str(my_list)

# str() on a list runs repr() on each of the elements.
# However, it seems that __repr__ on a string transforms it to an 
# ASCII-only representation
print ('str(my_list) = %s' % str(my_list))
for c in my_list_as_string:
    print c
print ('len(str(my_list)) = %s' % len(str(my_list)))
print ("\n")

# Which we can confirm here, where we can see that it it also adds the quotes:
print ('repr("é") == %s' % repr("é"))
for c in repr("é"):
    print c
print ('len(repr("é")) == %s' % len(repr("é")))
print ("\n")

# Even pprint fails
print ("pprint gives the same results")
pprint.pprint(my_list)

# It's useless to try to encode it, since all data is ASCII
print "Trying to encode"
print (my_list_as_string.encode ("utf8"))

Which generates this:

str(my_list) = ['\xc3\xa9l\xc3\xa9phant', 'Hello World']
[
'
\
x
c
3
\
x
a
9
l
\
x
c
3
\
x
a
9
p
h
a
n
t
'
,

'
H
e
l
l
o

W
o
r
l
d
'
]
len(str(my_list)) = 41


repr("é") == '\xc3\xa9'
'
\
x
c
3
\
x
a
9
'
len(repr("é")) == 10


pprint gives the same results
['\xc3\xa9l\xc3\xa9phant', 'Hello World']
Trying to encode
['\xc3\xa9l\xc3\xa9phant', 'Hello World']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM