简体   繁体   English

bs4 过滤 python

[英]bs4 filtering with python

i'm trying to write a script that checks the steam store, and i'm having a problem with filtering out all of the listings that don't have a discount within their code.我正在尝试编写一个脚本来检查 Steam 商店,但我在过滤掉代码中没有折扣的所有列表时遇到了问题。 i want to keep only the listings with the span tag and the <span>-percentage</span> within them, and not the one without.我只想保留带有 span 标签和<span>-percentage</span>的列表,而不是没有的。 here's my code:这是我的代码:

from urllib.request import urlopen
from datetime import date
import requests as rq

inp = str(input('what would you like to search up?'))
w = ('https://store.steampowered.com/search/?term=' + inp)
page = rq.get(w)
soup = bsoup(page.content, 'html.parser')
soup.prettify()
sales = soup.find_all('div', class_="responsive_search_name_combined")

for sale in sales:
    p = soup.find('div', class_="col search_price responsive_secondrow")
    d = soup.find_all('div', class_="col search_discount responsive_secondrow")
    n = soup.find('span', class_="title")

    if None in (d, n, p):
        continue
    print(d)

and the output (containing the things i want to filter out/the things i want to keep)和 output(包含我想过滤掉的东西/我想保留的东西)

<span>-16%</span>
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
<span>-19%</span>
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">
</div>, <div class="col search_discount responsive_secondrow">

etc etc. i've tried replacing d = soup.find_all('div', class_="col search_discount responsive_secondrow") with d = soup.find_all('span', string="-16%") to see if that would work and it didnt.等等等等 我试过用 d = d = soup.find_all('span', string="-16%")替换d = soup.find_all('div', class_="col search_discount responsive_secondrow")看看是否会这样工作,但没有。 i want to keep the span tags but not the div tags could anyone help with this?我想保留 span 标签而不是 div 标签 任何人都可以帮忙吗?

You can simply add a try-except block to the last for loop to solve your problem.您可以简单地在最后一个for循环中添加一个try-except块来解决您的问题。 Here is the full code:这是完整的代码:

from urllib.request import urlopen
from datetime import date
import requests as rq
from bs4 import BeautifulSoup as bsoup
inp = str(input('what would you like to search up?'))
w = ('https://store.steampowered.com/search/?term=' + inp)
page = rq.get(w)
soup = bsoup(page.content, 'html.parser')
soup.prettify()
sales = soup.find_all('div', class_="responsive_search_name_combined")

final = []

for sale in sales:
    p = soup.find('div', class_="col search_price responsive_secondrow")
    d = soup.find_all('div', class_="col search_discount responsive_secondrow")
    n = soup.find('span', class_="title")

    try:
        for element in d:
            span = element.span
            if span:
                final.append(span.text)
    except:
        pass
print(final)

Output: Output:

what would you like to search up?>? among us
['-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%', '-10%', '-25%']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM