简体   繁体   中英

BeautifulSoup's find() can't match Chinese character

from bs4 import BeautifulSoup
import requests

url = "http://www.paopaoche.net/psp/280873.html"
res = requests.get(url)
res.encoding="gb2312"
bsObj = BeautifulSoup(res.text)
tag1 = bsObj.find("dd", {"class":"left"}).find(class_="xq").find("em", text="游戏类型")
print(tag1)

The terminal return "None". If I change find("em", text="游戏类型") to find("em", text="1993") , terminal return correct result. Where is the problem?

Here is slightly modified code:

from bs4 import BeautifulSoup
import requests

url = "http://www.paopaoche.net/psp/280873.html"
res = requests.get(url)
res.encoding="gb2312"
bsObj = BeautifulSoup(res.content.decode('gb2312'), 'html5lib')

tag1 = bsObj.select("dd.left .xq")[0].find(lambda tag: tag.name == "em" and "游戏类型" in tag.text)

print(tag1)

"em" element contains not only text searched, but also another text and child elements, so it's needed to find elements containing search expression (not having text equal to search expression).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM