BeautifulSoup4 在链接中找到多个带有特定文本的 href 链接

Question

我正在尝试过滤所有带有字符串“3080”的 href 链接，我看到了一些示例，但我无法将它们应用到我的代码中。 有人可以告诉我如何只打印链接。

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
import driver_functions

gpu = '3080'
url = f'https://www.alternate.de/listing.xhtml?q={gpu}'

options = webdriver.ChromeOptions()
options.add_argument('--headless')

if __name__ == '__main__':
    browser = webdriver.Chrome(options=options, service=Service('chromedriver.exe'))
    try:

        browser.get(url)

        time.sleep(2)

        html = browser.page_source

        soup = BeautifulSoup(html, 'html.parser')

        gpu_list = soup.select("a", class_="grid-container listing")

        for link in gpu_list:
            print(link['href'])

        browser.quit()
    except:
        driver_functions.browserstatus(browser)

输出

Answer 1

您可以使用带有 * contains 运算符的 css attribute = value css 选择器来定位包含该gpu变量的列表中的href s。 如果您发现要考虑的边缘情况，您显然可以开发此 css 选择器列表。 我只看了给定的网址。

gpu_links= [i['href'] for i in soup.select(f".listing [href*='{gpu}']")]

Answer 2

试试这个作为你的选择器gpu_list = soup.select('#lazyListingContainer > div > div > div.grid-container.listing > a')

BeautifulSoup4 在链接中找到多个带有特定文本的 href 链接

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-11-11 02:21:38

解决方案2
1 2021-11-11 02:26:13

BeautifulSoup4 在链接中找到多个带有特定文本的 href 链接

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-11-11 02:21:38

解决方案2 1 2021-11-11 02:26:13

解决方案1
2 已采纳 2021-11-11 02:21:38

解决方案2
1 2021-11-11 02:26:13