简体   繁体   中英

How do I display non-english characters in python?

I have a python dictionary which contains items that have non-english characters. When I print the dictionary, the python shell does not properly display the non-english characters. How can I fix this?

When your application prints hei\\xdfen instead of heißen , it means you are not actually printing the actual unicode string, but instead, on the string representation of the unicode object.

Let us assume your string ("heißen") is stored into variable called text . Just to make sure where you are at, check out the type of this variable by calling:

>>> type(text)

If you get <type 'unicode'> , it means you are not dealing with a string, but instead a unicode object.

If you do the intuive thing and try to print to text by invoking print(text) you won't get out the actual text ("heißen") but instead, a string representation of a unicode object .

To fix this, you need to know which encoding your terminal has and print out your unicode object encoded according to the given encoding .

For instance, if your terminal uses UTF-8 encoding, you can print out a string by invoking:

text.encode('utf-8')

That's for the basic concepts. Now let me give you a more detailed example. Let us assume we have a source code file storing your dictionary. Like:

mydict = {'heiße': 'heiße', 'äää': 'ööö'}

When you type print mydict you will get {'\\xc3\\xa4\\xc3\\xa4\\xc3\\xa4': '\\xc3\\xb6\\xc3\\xb6\\xc3\\xb6', 'hei\\xc3\\x9fe': 'hei\\xc3\\x9fe'} . Even print mydict['äää'] doesn't work: it results in something like ├Â├Â├ . The nature of the problem is revealed by trying out print type(mydict['äää']) which will tell you that you are dealing with a string object.

In order to fix the problem, you first need to decode the string representation from your source code file's charset to unicode object and then represent it in the charset of your terminal. For individual dict items this can be achived by:

print unicode(mydict, 'utf-8')

Note that if default encoding doesn't apply to your terminal, you need to write:

print unicode(mydict, 'utf-8').encode('utf-8')

Where the outer encode method specifies the encoding according to your terminal.

I really really urge you to read through Joel's "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" . Unless you understand how character sets work, you will stumble across problems similar to this again and again.

Actually, that's not really a Python-related issue.

Your environment variables (I'm assuming that you're on either Linux or Mac) should have the UTF-8 character encoding active.

You should be able to put these in your ~/.profile (or ~/.bashrc) file :

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

-edit-

Actually, Mac uses UTF-8 by default. This is a Windows/Linux issue.

-edit 2-

You should, of course, always use unicode strings, a unicode editor and a unicode doctype. But I'm assuming that you know that :-)

In python terminal,

    >>> "heißen"
    is equivalent to
    >>> print repr("heißen")

Python documentation on repr in python 2 http://docs.python.org/2/library/functions.html#func-repr is scarse.

As can be seen, both give you 'byte-based' representation of byte-string "heißen", where all bytes, that are more then 127 are \\x encoded. This is where from you get

    'hei\xc3\x9fen'

unicode's repr() is not much more helpful. It correctly shows 'ß' as a single unincode cherecter '\\xdf', but is still unreadable.

Practical solution I found is to use python 3.

http://docs.python.org/3/library/functions.html#repr

the page also says

    ascii(object)
    As repr(), return a string containing a printable representation of an
    object, but escape the non-ASCII characters in the string returned by
    repr() using \x, \u or \U escapes. This generates a string similar to
    that returned by repr() in Python 2.

which explains things a little bit.

Python 3.0具有默认的unicode字符串,而在python 2.x中,您必须给字符串加上前缀u

u"汉字/漢字 chinese"  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM