[英]How can I find a specific div tag using bs4
I am trying to parse an html text and find all the "<div class="topclick-list-element-game-merchant">" tags.我正在尝试解析 html 文本并找到所有“<div class="topclick-list-element-game-merchant">”标签。
warpips = open('WarpipsPageText.txt', 'r')
page_text = warpips.read()
warpips.close()
bs4 = BeautifulSoup(page_text, 'html5lib')
div = bs4.find_all('div class="topclick-list-element-game-merchant"', bs4)
print(div)
When I run this code it prints an empty list.当我运行此代码时,它会打印一个空列表。
Below is a snippet from the html of what I am trying to isolate.下面是我试图隔离的 html 的片段。
<div class="topclick-list-element-game-merchant">
CDKeys.com
<div class="platform platform-pc" title="pc"></div>
<div class="platform platform-xbox" title="xbox"></div>
</div>
</div>
<span class="topclick-list-element-price">$11.19</span>
</a>
<a href="https://cheapdigitaldownload.com/nier-replicant-ver-1-22474487139-digital-download-price-comparison/" title="NieR Replicant ver.1.22474487139 cd key best prices" class="topclick-list-element ">
<div class="topclick__image tpsprite11 tpsprite11-82-buy-nier-replicant-ver-1-22474487139-cd-key-pc-download-catalog-0" data-tp="tpsprite11"></div>
<div class="topclick-list-element-game">
<div class="topclick-list-element-game-title">NieR Replicant ver.1.22474487139</div>
<div class="topclick-list-element-game-merchant">
You have misplaced a single quote.你放错了一个单引号。
div = bs4.find_all('div', class="topclick-list-element-game-merchant")
To find all <div>
with class="topclick-list-element-game-merchant"
you can use following example:要查找所有带有
class="topclick-list-element-game-merchant"
<div>
,您可以使用以下示例:
from bs4 import BeautifulSoup
html_doc = """
<div class="topclick-list-element-game-merchant">
CDKeys.com
<div class="platform platform-pc" title="pc"></div>
<div class="platform platform-xbox" title="xbox"></div>
</div>
</div>
<span class="topclick-list-element-price">$11.19</span>
</a>
<a href="https://cheapdigitaldownload.com/nier-replicant-ver-1-22474487139-digital-download-price-comparison/" title="NieR Replicant ver.1.22474487139 cd key best prices" class="topclick-list-element ">
<div class="topclick__image tpsprite11 tpsprite11-82-buy-nier-replicant-ver-1-22474487139-cd-key-pc-download-catalog-0" data-tp="tpsprite11"></div>
<div class="topclick-list-element-game">
<div class="topclick-list-element-game-title">NieR Replicant ver.1.22474487139</div>
<div class="topclick-list-element-game-merchant">
"""
soup = BeautifulSoup(html_doc, "html.parser")
for div in soup.find_all(class_="topclick-list-element-game-merchant"):
print(div.get_text(strip=True))
Prints:印刷:
CDKeys.com
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.