Python：'ascii'编解码器不能编码字符

Question

I am using the following code to scrape a webpage that contains Japanese characters: 我使用以下代码来刮取包含日文字符的网页：

import urllib2
import bs4
import time

url = 'http://www.city.sapporo.jp/eisei/tiiki/toban.html'

pagecontent = urllib2.urlopen(url)
soup = bs4.BeautifulSoup(pagecontent.read().decode("utf8"))

print(soup.prettify())
print(soup)

In some machines the code works fine, and the last two statements print the result successfully. 在某些机器中，代码工作正常，最后两个语句成功打印结果。 However, in some machines the last but one statement gives the error 但是，在某些机器中，最后一个语句会出错

UnicodeEncodeError 'ascii' codec can't encode characters in position 485-496: ordinal not in range(128),

and the last statement prints strange squares for all Japanese characters. 最后一个语句打印所有日文字符的奇怪方块。

Why the same code works differently for two machines? 为什么两台机器的相同代码工作方式不同？ How can I fix this? 我怎样才能解决这个问题？

Python version 2.6.6 Python版本2.6.6

bs4 version: 4.1.0 bs4版本：4.1.0

Answer 1

You need to configure your environment locale correctly; 您需要正确配置您的环境区域设置; once your locale is set, Python will pick it up automatically when printing to a terminal. 一旦你的语言环境被设置，Python将在打印到终端时自动拾取它。

Check your locale with the locale command: 使用locale命令检查您的语言locale ：

$ locale
LANG="en_GB.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

Note the .UTF-8 in my locale settings; 请注意我的语言环境设置中的.UTF-8 ; it tells programs running in the terminal that my terminal uses the UTF-8 codec, one that supports all of Unicode. 它告诉终端中运行的程序我的终端使用UTF-8编解码器，支持所有Unicode。

You can set all of your locale in one step with the LANG environment variable: 您可以使用LANG环境变量一步设置所有语言环境：

export LANG="en_US.UTF-8"

for a US locale (how dates and numbers are printed) with the UTF-8 codec. 使用UTF-8编解码器进行美国语言环境（如何打印日期和数字）。 To be precise, the LC_CTYPE setting is used for the output codec, which in turn defaults to the LANG value. 确切地说， LC_CTYPE设置用于输出编解码器，而输出编解码器默认为LANG值。

Also see the very comprehensive UTF-8 and Unicode FAQ for Unix/Linux . 另请参阅Unix / Linux的非常全面的 UTF-8和Unicode FAQ 。

Python：'ascii'编解码器不能编码字符

问题描述

1 个解决方案

解决方案1
7 2014-12-21 16:05:14

Python：'ascii'编解码器不能编码字符

问题描述

1 个解决方案

解决方案1 7 2014-12-21 16:05:14

解决方案1
7 2014-12-21 16:05:14