I am trying to parse an html text and find all the "<div class="topclick-list-element-game-merchant">" tags.
warpips = open('WarpipsPageText.txt', 'r')
page_text = warpips.read()
warpips.close()
bs4 = BeautifulSoup(page_text, 'html5lib')
div = bs4.find_all('div class="topclick-list-element-game-merchant"', bs4)
print(div)
When I run this code it prints an empty list.
Below is a snippet from the html of what I am trying to isolate.
<div class="topclick-list-element-game-merchant">
CDKeys.com
<div class="platform platform-pc" title="pc"></div>
<div class="platform platform-xbox" title="xbox"></div>
</div>
</div>
<span class="topclick-list-element-price">$11.19</span>
</a>
<a href="https://cheapdigitaldownload.com/nier-replicant-ver-1-22474487139-digital-download-price-comparison/" title="NieR Replicant ver.1.22474487139 cd key best prices" class="topclick-list-element ">
<div class="topclick__image tpsprite11 tpsprite11-82-buy-nier-replicant-ver-1-22474487139-cd-key-pc-download-catalog-0" data-tp="tpsprite11"></div>
<div class="topclick-list-element-game">
<div class="topclick-list-element-game-title">NieR Replicant ver.1.22474487139</div>
<div class="topclick-list-element-game-merchant">
You have misplaced a single quote.
div = bs4.find_all('div', class="topclick-list-element-game-merchant")
To find all <div>
with class="topclick-list-element-game-merchant"
you can use following example:
from bs4 import BeautifulSoup
html_doc = """
<div class="topclick-list-element-game-merchant">
CDKeys.com
<div class="platform platform-pc" title="pc"></div>
<div class="platform platform-xbox" title="xbox"></div>
</div>
</div>
<span class="topclick-list-element-price">$11.19</span>
</a>
<a href="https://cheapdigitaldownload.com/nier-replicant-ver-1-22474487139-digital-download-price-comparison/" title="NieR Replicant ver.1.22474487139 cd key best prices" class="topclick-list-element ">
<div class="topclick__image tpsprite11 tpsprite11-82-buy-nier-replicant-ver-1-22474487139-cd-key-pc-download-catalog-0" data-tp="tpsprite11"></div>
<div class="topclick-list-element-game">
<div class="topclick-list-element-game-title">NieR Replicant ver.1.22474487139</div>
<div class="topclick-list-element-game-merchant">
"""
soup = BeautifulSoup(html_doc, "html.parser")
for div in soup.find_all(class_="topclick-list-element-game-merchant"):
print(div.get_text(strip=True))
Prints:
CDKeys.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.