简体   繁体   中英

Getting a Unicode error when printing prettified BeautifulSoup

I am currently taking a course on Python and during our unit on Beautiful Soup the instructor uses the following code:

import requests, pprint
from bs4 import BeautifulSoup

url = 'https://www.epicurious.com/search/tofu%20chili'
response = requests.get(url)
page_soup = BeautifulSoup(response.content, 'lxml')
print(page_soup.prettify())

When I run this code, I get the following error:

Traceback (most recent call last):
  File "/Users/arocklin/Documents/Python/whiteboard2.py", line 11, in <module>
    print(page_soup)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1479: ordinal not in range(128)

I was wondering why I got this since it worked for him and how I can fix it going forward. Thanks!

Your problem is not related to BeautifulSoup or to parsing HTML. Your code up to and including BeautifulSoup.prettify gets you some unicode string defined by a webserver not under your control.

That more or less arbitrary unicode string you then try to print.

On a system where Python has determined that the terminal sys.stdout can only handle ascii encoded strings, and if the webserver has (for reasons entirely beyond your control) has decided to give you some Unicode characters outside the ASCII range, Python cannot encode that character and throws an exception.

I suggest you research how your version of Python determines the encodings/codecs to use on the platform you are running Python on.

Then put a test case into your program's test suite which actually verifies it can properly output Unicode strings. For that test, you can replace your entire program with

print(u"foo\xe9bar")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM