简体   繁体   English

打印美化的BeautifulSoup时出现Unicode错误

[英]Getting a Unicode error when printing prettified BeautifulSoup

I am currently taking a course on Python and during our unit on Beautiful Soup the instructor uses the following code: 我目前正在上Python课程,在我们的Beautiful Soup单元中,讲师使用以下代码:

import requests, pprint
from bs4 import BeautifulSoup

url = 'https://www.epicurious.com/search/tofu%20chili'
response = requests.get(url)
page_soup = BeautifulSoup(response.content, 'lxml')
print(page_soup.prettify())

When I run this code, I get the following error: 运行此代码时,出现以下错误:

Traceback (most recent call last):
  File "/Users/arocklin/Documents/Python/whiteboard2.py", line 11, in <module>
    print(page_soup)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1479: ordinal not in range(128)

I was wondering why I got this since it worked for him and how I can fix it going forward. 我想知道为什么自从对他有用以来就得到了它,以及如何解决这个问题。 Thanks! 谢谢!

Your problem is not related to BeautifulSoup or to parsing HTML. 您的问题与BeautifulSoup或与解析HTML无关。 Your code up to and including BeautifulSoup.prettify gets you some unicode string defined by a webserver not under your control. 直到BeautifulSoup.prettify以及包括BeautifulSoup.prettify代码都会为您提供由Web服务器定义但不受您控制的unicode字符串

That more or less arbitrary unicode string you then try to print. 然后,您尝试打印或多或少的任意unicode字符串。

On a system where Python has determined that the terminal sys.stdout can only handle ascii encoded strings, and if the webserver has (for reasons entirely beyond your control) has decided to give you some Unicode characters outside the ASCII range, Python cannot encode that character and throws an exception. 在Python已确定终端sys.stdout只能处理ascii编码的字符串的系统上,并且如果网络服务器(出于完全超出您的控制范围的原因)已决定为您提供ASCII范围之外的一些Unicode字符,则Python无法对其进行编码字符并引发异常。

I suggest you research how your version of Python determines the encodings/codecs to use on the platform you are running Python on. 我建议您研究您的Python版本如何确定在运行Python的平台上使用的编码/编解码器。

Then put a test case into your program's test suite which actually verifies it can properly output Unicode strings. 然后将一个测试用例放入程序的测试套件中,该套件实际上验证了它可以正确输出Unicode字符串。 For that test, you can replace your entire program with 对于该测试,您可以将整个程序替换为

print(u"foo\xe9bar")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM