简体   繁体   English

如何在python中显示非英语字符?

[英]How do I display non-english characters in python?

I have a python dictionary which contains items that have non-english characters. 我有一个python字典,其中包含具有非英语字符的项目。 When I print the dictionary, the python shell does not properly display the non-english characters. 当我打印字典时,python shell无法正确显示非英语字符。 How can I fix this? 我怎样才能解决这个问题?

When your application prints hei\\xdfen instead of heißen , it means you are not actually printing the actual unicode string, but instead, on the string representation of the unicode object. 当您的应用程序打印hei\\xdfen而不是heißen ,这意味着您实际上不是在打印实际的unicode字符串,而是在unicode对象的字符串表示heißen打印。

Let us assume your string ("heißen") is stored into variable called text . 让我们假设您的字符串(“heißen”)存储在名为text变量中。 Just to make sure where you are at, check out the type of this variable by calling: 为了确保您所在的位置,请通过调用以下命令来检查此变量的类型:

>>> type(text)

If you get <type 'unicode'> , it means you are not dealing with a string, but instead a unicode object. 如果得到<type 'unicode'> ,则意味着您不是在处理字符串,而是在处理unicode对象。

If you do the intuive thing and try to print to text by invoking print(text) you won't get out the actual text ("heißen") but instead, a string representation of a unicode object . 如果您执行直观操作并尝试通过调用print(text)打印到文本,则不会得到实际的文本(“heißen”),而是得到unicode对象的字符串表示形式

To fix this, you need to know which encoding your terminal has and print out your unicode object encoded according to the given encoding . 要解决此问题,您需要知道终端具有哪种编码,并打印出根据给定encoding编码的unicode对象

For instance, if your terminal uses UTF-8 encoding, you can print out a string by invoking: 例如,如果您的终端使用UTF-8编码,则可以通过调用以下命令来打印出字符串:

text.encode('utf-8')

That's for the basic concepts. 这是基本概念。 Now let me give you a more detailed example. 现在让我给你一个更详细的例子。 Let us assume we have a source code file storing your dictionary. 让我们假设我们有一个存储您的字典的源代码文件。 Like: 喜欢:

mydict = {'heiße': 'heiße', 'äää': 'ööö'}

When you type print mydict you will get {'\\xc3\\xa4\\xc3\\xa4\\xc3\\xa4': '\\xc3\\xb6\\xc3\\xb6\\xc3\\xb6', 'hei\\xc3\\x9fe': 'hei\\xc3\\x9fe'} . 键入print mydict您将得到{'\\xc3\\xa4\\xc3\\xa4\\xc3\\xa4': '\\xc3\\xb6\\xc3\\xb6\\xc3\\xb6', 'hei\\xc3\\x9fe': 'hei\\xc3\\x9fe'} Even print mydict['äää'] doesn't work: it results in something like ├Â├Â├ . 甚至连print mydict['äää']都无效:它会导致类似├Â├Â├ The nature of the problem is revealed by trying out print type(mydict['äää']) which will tell you that you are dealing with a string object. 通过尝试print type(mydict['äää'])可以发现问题的本质,该print type(mydict['äää'])将告诉您正在处理string对象。

In order to fix the problem, you first need to decode the string representation from your source code file's charset to unicode object and then represent it in the charset of your terminal. 为了解决此问题,您首先需要将字符串表示形式从源代码文件的字符集解码为unicode对象,然后在终端的字符集中对其进行表示。 For individual dict items this can be achived by: 对于单个词典项目,可以通过以下方式实现:

print unicode(mydict, 'utf-8')

Note that if default encoding doesn't apply to your terminal, you need to write: 请注意,如果默认编码不适用于您的终端,则需要编写:

print unicode(mydict, 'utf-8').encode('utf-8')

Where the outer encode method specifies the encoding according to your terminal. 其中,外部编码方法根据您的终端指定编码。

I really really urge you to read through Joel's "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" . 我真的很希望您通读Joel的“每个软件开发人员绝对绝对肯定要了解Unicode和字符集的绝对最低要求(无借口!)” Unless you understand how character sets work, you will stumble across problems similar to this again and again. 除非您了解字符集的工作原理,否则您将一次又一次地遇到类似问题。

Actually, that's not really a Python-related issue. 实际上,这并不是与Python相关的问题。

Your environment variables (I'm assuming that you're on either Linux or Mac) should have the UTF-8 character encoding active. 您的环境变量(假设您使用的是Linux或Mac)应该启用UTF-8字符编码。

You should be able to put these in your ~/.profile (or ~/.bashrc) file : 您应该可以将它们放在〜/ .profile(或〜/ .bashrc)文件中:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8

-edit- -编辑-

Actually, Mac uses UTF-8 by default. 实际上,Mac默认情况下使用UTF-8。 This is a Windows/Linux issue. 这是Windows / Linux问题。

-edit 2- -编辑2-

You should, of course, always use unicode strings, a unicode editor and a unicode doctype. 当然,您应该始终使用unicode字符串,unicode编辑器和unicode doctype。 But I'm assuming that you know that :-) 但我假设您知道:-)

In python terminal, 在python终端中,

    >>> "heißen"
    is equivalent to
    >>> print repr("heißen")

Python documentation on repr in python 2 http://docs.python.org/2/library/functions.html#func-repr is scarse. python 2中有关repr的Python文档http://docs.python.org/2/library/functions.html#func-repr很匮乏。

As can be seen, both give you 'byte-based' representation of byte-string "heißen", where all bytes, that are more then 127 are \\x encoded. 可以看出,两者都为您提供了基于字节的字符串“heißen”的表示形式,其中所有大于127的字节都经过\\ x编码。 This is where from you get 这是你从哪里得到的

    'hei\xc3\x9fen'

unicode's repr() is not much more helpful. unicode的repr()并没有多大帮助。 It correctly shows 'ß' as a single unincode cherecter '\\xdf', but is still unreadable. 它正确地将“ß”显示为单个unincode cherecter“ \\ xdf”,但仍不可读。

Practical solution I found is to use python 3. 我发现的实用解决方案是使用python 3。

http://docs.python.org/3/library/functions.html#repr http://docs.python.org/3/library/functions.html#repr

the page also says 该页面还说

    ascii(object)
    As repr(), return a string containing a printable representation of an
    object, but escape the non-ASCII characters in the string returned by
    repr() using \x, \u or \U escapes. This generates a string similar to
    that returned by repr() in Python 2.

which explains things a little bit. 这可以解释一些事情。

Python 3.0具有默认的unicode字符串,而在python 2.x中,您必须给字符串加上前缀u

u"汉字/漢字 chinese"  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Python中解析非英语日期字符串? - How do I parse non-English date strings in Python? django / python:python如何编码非英语字符 - django/python: How does python encode non-English characters 如何在python程序中比较非英文字符? - How to compare non-English(Chinese) Characters in python program? 使用Python编码-将非英文字符转换为URL - Encoding in Python - non-English characters into a URL 对于非英语语言,我可以在 scikit-learn 中使用 TfidfVectorizer 吗? 另外,如何阅读 Python 中的非英语文本? - Can I use TfidfVectorizer in scikit-learn for non-English language? Also how do I read a non-English text in Python? 如何从文件中删除非英语单词? - How do I remove non-English words from a file? 如何在具有大量与之相关的表,脚本等的SQL脚本中找到唯一的非英语字符? - How do I find unique, non-English characters in a SQL script that has a lot of tables, scripts, etc. related to it? 有可能在python 2中引发包含非英文字符的异常吗? - possible to raise exception that includes non-english characters in python 2? 在Python3中更正一串非英文字符的长度 - Correct length of a string of non-English characters in Python3 Python无法打开路径中包含非英文字符的文件 - Python not able to open file with non-english characters in path
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM