简体   繁体   English

python'ascii'在打印语句中编码问题

[英]Python 'ascii' encode problems in print statement

System: python 3.4.2 on linux. 系统:Linux上的python 3.4.2。

I'm woring on a django application (irrelevant), and I encountered a problem that it throws 我在django应用程序上担心(无关),并且遇到了引发的问题

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

when print is called (!). 当调用打印 (!)时。 After quite a bit of digging, I discovered I should check 经过一番挖掘之后,我发现我应该检查一下

>>> sys.getdefaultencoding()
'utf-8'

but it was as expected, utf8. 但是确实如预期,utf8。 I noticed also that os.path.exists throws the same exception when used with a unicode string. 我还注意到,当与unicode字符串一起使用时, os.path.exists引发相同的异常。 So I checked 所以我检查了

>>> sys.getfilesystemencoding()
'ascii'

When I used LANG=en_US.UTF-8 the issue disappeared. 当我使用LANG=en_US.UTF-8 ,问题消失了。 I understand now why os.path.exists had problems with that. 我现在了解为什么os.path.exists对此存在问题。 But I have absolutely no clue why print statement is affected by the filesystem setting. 但是我绝对不知道为什么print语句受文件系统设置影响。 Is there a third setting I'm missing? 我缺少第三种设置吗? Or does it just assume LANG environment is to be trusted for everything? 还是只是假设LANG环境对所有事物都值得信赖?

Also... I don't get the reasoning here. 另外...我在这里没有理由。 LANG does not tell what encoding is supported by the filenames. LANG不会告诉您文件名支持哪种编码。 It has nothing to do with that. 与此无关。 It's set separately for the current environment, not for the filesystem. 它是针对当前环境(而非文件系统)单独设置的。 Why is python using this setting for filesystem filenames? 为什么python在文件系统文件名中使用此设置? It makes applications very fragile, as all the file operations just break when run in an environment where LANG is not set or set to C (not uncommon, especially when a web-app is run as root or a new user created specifically for the daemon). 它使应用程序非常脆弱,因为在未将LANG设置或设置为C的环境中运行时,所有文件操作都会中断(这并不罕见,尤其是当Web应用程序以root用户运行或专门为守护程序创建的新用户运行时) )。

Test code (no actual unicode input needed to avoid terminal encoding pitfalls): 测试代码(无需实际的unicode输入即可避免终端编码陷阱):

x=b'\xc4\x8c\xc5\xbd'
y=x.decode('utf-8')
print(y)

Question: 题:

  • is there a good and accepted way of making the application robust to the LANG setting? 有没有一种很好的并且可以接受的方式来使应用程序适应LANG设置?
  • is there any real-world reason to guess the filesystem capabilities from environment instead of the filesystem driver? 在现实世界中有没有理由从环境而不是文件系统驱动程序中猜测文件系统功能?
  • why is print affected? 为什么print受到影响?

LANG is used to determine your locale ; LANG用于确定您的语言环境 if you don't set specific LC_ variables the LANG variable is used as the default. 如果您未设置特定的LC_变量,则将LANG变量用作默认变量。

The filesystem encoding is determined by the LC_CTYPE variable , but if you haven't set that variable specifically, the LANG environment variable is used instead. 文件系统编码由LC_CTYPE变量确定,但如果未专门设置该变量,则改用LANG环境变量。

Printing uses sys.stdout , a textfile configured with the codec your terminal uses. 打印使用sys.stdout ,这是使用终端使用的编解码器配置的文本文件。 Your terminal settings is also locale specific; 您的终端设置也是特定于语言环境的; your LANG variable should really reflect what locale your terminal is set to. 您的LANG变量应真正反映您的终端设置的语言环境。 If that is UTF-8, you need to make sure your LANG variable reflects that. 如果是UTF-8,则需要确保您的LANG变量能反映出这一点。 sys.stdout uses locale.getpreferredencoding(False) (like all text streams opened without an explicit encoding set) and on POSIX systems that'll use LC_CTYPE too. sys.stdout使用locale.getpreferredencoding(False) (就像在没有显式编码集的情况下打开的所有文本流一样),在POSIX系统上也将使用LC_CTYPE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM