简体   繁体   English

Python 3.6中UnicodeDecodeError如何解决?

[英]How to solve UnicodeDecodeError in Python 3.6?

I am switched from Python 2.7 to Python 3.6.我从 Python 2.7 切换到 Python 3.6。

I have scripts that deal with some non-English content.我有处理一些非英语内容的脚本。

I usually run scripts via Cron and also in Terminal.我通常通过 Cron 和终端运行脚本。

I had UnicodeDecodeError in my Python 2.7 scripts and I solved by this.我的 Python 2.7 脚本中有 UnicodeDecodeError,我通过这个解决了。

# encoding=utf8  
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')

Now in Python 3.6, it doesnt work.现在在Python 3.6,不行。 I have print statements like print("Here %s" % (myvar)) and it throws error.我有像print("Here %s" % (myvar))这样的打印语句,它会抛出错误。 I can solve this issue by replacing it to myvar.encode("utf-8") but I don't want to write with each print statement.我可以通过将它替换为myvar.encode("utf-8")来解决这个问题,但我不想用每个打印语句来编写。

I did PYTHONIOENCODING=utf-8 in my terminal and I have still that issue.我在我的终端上做了PYTHONIOENCODING=utf-8 ,但我仍然有那个问题。

Is there a cleaner way to solve UnicodeDecodeError issue in Python 3.6? Python 3.6 中是否有更简洁的方法来解决UnicodeDecodeError问题?

is there any way to tell Python3 to print everything in utf-8?有没有办法告诉 Python3 打印 utf-8 中的所有内容? just like I did in Python2?就像我在 Python2 中所做的那样?

It sounds like your locale is broken and have another bytes->Unicode issue .听起来您的语言环境已损坏并且还有另一个 bytes->Unicode issue The thing you did for Python 2.7 is a hack that only masked the real problem (there's a reason why you have to reload sys to make it work).您为 Python 2.7 所做的事情只是掩盖了真正的问题(您必须reload sys以使其工作是有原因的)。

To fix your locale, try typing locale from the command line.要修复您的语言环境,请尝试从命令行输入locale It should look something like:它应该看起来像:

LANG=en_GB.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=

locale depends on LANG being set properly. locale取决于LANG正确设置。 Python effectively uses locale to work out what encoding to use when writing to stdout in. If it can't work it out, it defaults to ASCII. Python 有效地使用locale来确定写入 stdout 时使用的编码。如果无法解决,则默认为 ASCII。

You should first attempt to fix your locale.您应该首先尝试修复您的语言环境。 If locale errors, make sure you've installed the correct language pack for your region.如果locale错误,请确保您已安装适用于您所在地区的正确语言包。

If all else fails, you can always fix Python by setting PYTHONIOENCODING=UTF-8 .如果所有其他方法都失败了,您始终可以通过设置PYTHONIOENCODING=UTF-8来修复 Python。 This should be used as a last resort as you'll be masking problems once again.这应该用作最后的手段,因为您将再次掩盖问题。

If Python is still throwing an error after setting PYTHONIOENCODING then please update your question with the stacktrace.如果 Python 在设置PYTHONIOENCODING后仍然抛出错误,请使用PYTHONIOENCODING更新您的问题。 Chances are you've got an implied conversion going on.很可能您正在进行隐含转换。

I had this issue when using Python inside a Docker container based on Ubuntu 18.04.我在基于 Ubuntu 18.04 的 Docker 容器中使用 Python 时遇到了这个问题。 It appeared to be a locale issue, which was solved by adding the following to the Dockerfile:这似乎是一个语言环境问题,可以通过将以下内容添加到 Dockerfile 来解决:

ENV LANG C.UTF-8

For a Python-only solution you will have to recreate your sys.stdout object:对于仅限 Python 的解决方案,您必须重新创建sys.stdout对象:

import sys, codecs
sys.stdout = codecs.getwriter('utf-8')(sys.stdout.detach())

After this, a normal print("hello world") should be encoded to UTF-8 automatically.在此之后,正常的print("hello world")应自动编码为 UTF-8。

But you should try to find out why your terminal is set to such a strange encoding (which Python just tries to adopt to).但是您应该尝试找出为什么您的终端设置为如此奇怪的编码(Python 只是试图采用这种编码)。 Maybe your operating system is configured wrong somehow.也许您的操作系统以某种方式配置错误。

EDIT: In my tests unsetting the env variable LANG produced this strange setting for the stdout encoding for me:编辑:在我的测试中,取消设置 env 变量LANG为我的 stdout 编码产生了这个奇怪的设置:

LANG= python3
import sys
sys.stdout.encoding

printed 'ANSI_X3.4-1968' .打印'ANSI_X3.4-1968'

So I guess you might want to set your LANG to something like en_US.UTF-8 .所以我猜你可能想将你的LANG设置为en_US.UTF-8类的东西。 Your terminal program doesn't seem to do this.您的终端程序似乎没有这样做。

To everyone using pickle to load a file previously saved in python 2 and getting an UnicodeDecodeError, try setting pickle encoding parameter:对于使用pickle加载以前保存在python 2中的文件并获得UnicodeDecodeError的每个人,请尝试设置pickle encoding参数:

with open("./data.pkl", "rb") as data_file:
    samples = pickle.load(data_file, encoding='latin1')

for docker with python3.6, use LANG=C.UTF-8 python or jupyter xxx works for me, thanks to @Daniel and @zhy对于 docker 和 python3.6,使用LANG=C.UTF-8 python or jupyter xxx对我有用,感谢@Daniel 和@zhy

Python 3 (including 3.6) is already Unicode supported. Python 3(包括 3.6)已经支持 Unicode。 Here is the doc - https://docs.python.org/3/howto/unicode.html这是文档 - https://docs.python.org/3/howto/unicode.html

So you don't need to force Unicode support like Python 2.7.所以你不需要像 Python 2.7 那样强制支持 Unicode。 Try to run your code normally.尝试正常运行您的代码。 If you get any error reading a Unicode text file you need to use the encoding='utf-8' parameter while reading the file.如果您在读取 ​​Unicode 文本文件时遇到任何错误,您需要在读取文件时使用encoding='utf-8'参数。

I mean you could write an custom function like this: (Not optimal i know)我的意思是你可以写一个这样的自定义函数:(我知道不是最优的)


import sys

def printUTF8(input):
    print(input.encode("utf-8"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM