简体   繁体   English

Windows Python:使用语言环境模块更改编码

[英]Windows Python: Changing encoding using the locale module

Using Python 2.7 使用Python 2.7

I am writing an abstract web scraper and am having problems when displaying (printing) certain characters. 我正在写一个抽象的网络抓取工具,在显示(打印)某些字符时遇到问题。

I get the trace-back error: UnicodeEncodeError: 'ascii' codec can't encode character u'\☆' in position 5: ordinal not in range(128) from printing the string containing the character. 我得到回溯错误: UnicodeEncodeError: 'ascii' codec can't encode character u'\☆' in position 5: ordinal not in range(128)从打印包含字符的字符串开始, UnicodeEncodeError: 'ascii' codec can't encode character u'\☆' in position 5: ordinal not in range(128)

I used the locale module to find out my OS supported settings, although I'm not certain I should use locale for my problem, and noticed the default settings where (en_US', 'cp1252') . 尽管不确定我是否应该使用语言环境来解决问题,但我使用语言环境模块来查找操作系统支持的设置,并注意到默认设置为(en_US', 'cp1252') I am trying to change it to ('en_US', 'utf-8') but sadly to no avail. 我正在尝试将其更改为('en_US', 'utf-8')但遗憾的是无济于事。

#code for default settings
print locale.getdefaultlocale()

This is the code I used to narrow down my locale setting options. 这是我用来缩小语言环境设置选项的代码。 ( No problems here, the code is just so anyone that wants to, can follow along ) (这里没有问题,代码只是任何人想要的,都可以遵循)

import locale
all = locale.locale_alias().items()
utfs = [(k,v) for k,v in all if 'utf' in k.lower() or 'utf' in v.lower()]

# utf settings starting with en
en_utfs = [(k,v) for k,v in utfs if k.lower()[:2].lower() == 'en' or 
            v.lower()[:2] == 'en'

print en_utfs

This gives the output: 这给出了输出:

[('en_ie.utf8@euro', 'en_IE.UTF-8'), ('universal.utf8@ucs4', 'en_US.UTF-8')]

Here is where my problem lies; 这是我的问题所在。 with trying to change the setting to en_US.UTF-8 . 尝试将设置更改为en_US.UTF-8

[IN]: locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' )
[OUT]: Traceback code ...
[OUT]: locale.Error: unsupported locale setting

Sorry for all the code, for some reason I felt the excessive need to do so. 对不起所有代码,出于某种原因,我感到这样做的必要性过多。

Check this https://docs.moodle.org/dev/Table_of_locales 检查此https://docs.moodle.org/dev/Table_of_locales

I think in windows you need to set 'localewin' value instead of the locale name. 我认为在Windows中,您需要设置“ localewin”值而不是语言环境名称。 Setting locale.setlocale( locale.LC_ALL, 'English_United States.1252' ) worked for me in windows. 在Windows中设置locale.setlocale( locale.LC_ALL, 'English_United States.1252' )对我locale.setlocale( locale.LC_ALL, 'English_United States.1252' ) I also tried setting different locales Dutch_Netherlands.1252 and they worked. 我还尝试设置其他语言环境Dutch_Netherlands.1252并且它们起作用。 Though this might not solve your problem of UnicodeEncodeError, but I think this atleast explains why you are unable to set the locale. 虽然这可能无法解决您的UnicodeEncodeError问题,但我认为此问题至少说明了您无法设置语言环境的原因。

I couldn't fix my problem, but I found a work around by remove all non-ASCII characters. 我无法解决问题,但是通过删除所有非ASCII字符找到了解决方法。 See stack answer replace non ascii-characters with a single space 请参阅堆栈答案用一个空格替换非ASCII字符

You need to use the full name. 您需要使用全名。 So for example use: 因此,例如使用:

locale.setlocale( locale.LC_CTYPE, 'Chinese (Simplified)_People\'s Republic of China' )  

instead of 代替

locale.setlocale(locale.LC_ALL,'zh_CN.cpk936')      

If this was successful you should expect this result: 如果成功,您应该期望以下结果:

print(locale.getlocale(locale.LC_CTYPE))    
("Chinese (Simplified)_People's Republic of China", '936')       

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM