简体   繁体   English

Python:'ascii'编解码器不能编码字符

[英]Python: 'ascii' codec can't encode characters

I am using the following code to scrape a webpage that contains Japanese characters: 我使用以下代码来刮取包含日文字符的网页:

import urllib2
import bs4
import time

url = 'http://www.city.sapporo.jp/eisei/tiiki/toban.html'

pagecontent = urllib2.urlopen(url)
soup = bs4.BeautifulSoup(pagecontent.read().decode("utf8"))

print(soup.prettify())
print(soup)

In some machines the code works fine, and the last two statements print the result successfully. 在某些机器中,代码工作正常,最后两个语句成功打印结果。 However, in some machines the last but one statement gives the error 但是,在某些机器中,最后一个语句会出错

UnicodeEncodeError 'ascii' codec can't encode characters in position 485-496: ordinal not in range(128),

and the last statement prints strange squares for all Japanese characters. 最后一个语句打印所有日文字符的奇怪方块。

Why the same code works differently for two machines? 为什么两台机器的相同代码工作方式不同? How can I fix this? 我怎样才能解决这个问题?

Python version 2.6.6 Python版本2.6.6

bs4 version: 4.1.0 bs4版本:4.1.0

You need to configure your environment locale correctly; 您需要正确配置您的环境区域设置; once your locale is set, Python will pick it up automatically when printing to a terminal. 一旦你的语言环境被设置,Python将在打印到终端时自动拾取它。

Check your locale with the locale command: 使用locale命令检查您的语言locale

$ locale
LANG="en_GB.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

Note the .UTF-8 in my locale settings; 请注意我的语言环境设置中的.UTF-8 ; it tells programs running in the terminal that my terminal uses the UTF-8 codec, one that supports all of Unicode. 它告诉终端中运行的程序我的终端使用UTF-8编解码器,支持所有Unicode。

You can set all of your locale in one step with the LANG environment variable: 您可以使用LANG环境变量一步设置所有语言环境:

export LANG="en_US.UTF-8"

for a US locale (how dates and numbers are printed) with the UTF-8 codec. 使用UTF-8编解码器进行美国语言环境(如何打印日期和数字)。 To be precise, the LC_CTYPE setting is used for the output codec, which in turn defaults to the LANG value. 确切地说, LC_CTYPE设置用于输出编解码器,而输出编解码器默认为LANG值。

Also see the very comprehensive UTF-8 and Unicode FAQ for Unix/Linux . 另请参阅Unix / Linux非常全面的 UTF-8和Unicode FAQ

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python'ascii'编解码器无法编码字符 - python 'ascii' codec can't encode characters Python UnicodeEncodeError:'ascii'编解码器无法编码字符 - Python UnicodeEncodeError: 'ascii' codec can't encode characters Scrapy:ASCII编解码器无法编码字符 - Scrapy: ascii' codec can't encode characters UnicodeEncodeError:'ascii'编解码器无法编码字符 - UnicodeEncodeError: 'ascii' codec can't encode characters python'ascii'编解码器无法编码字符 - python 'ascii' codec can't encode character Python:追溯 XML 文件错误? 和 ASCII 和 'ascii' 编解码器不能编码字符 - Python: Traceback XML-file error? and ASCII and 'ascii' codec can't encode characters UnicodeEncodeError:“ ascii”编解码器无法对不在范围内的字符进行编码(128) - UnicodeEncodeError: 'ascii' codec can't encode characters ordinal not in range(128) Python mmh3:UnicodeEncodeError:'ascii'编解码器无法在位置0-14处编码字符:序数不在范围内(128) - Python mmh3: UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-14: ordinal not in range(128) Python:UnicodeEncodeError:'ascii'编解码器无法在位置34-39处编码字符:序数不在范围内(128) - Python: UnicodeEncodeError: 'ascii' codec can't encode characters in position 34-39: ordinal not in range(128) Python2.7 UnicodeEncodeError:'ascii'编解码器不能编码0-11位的字符:序号不在范围内(128) - Python2.7 UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM