在超鏈接中搜索詞 Python 3

Question

我正在使用 python 3 腳本來抓取網站並檢查產品是否有貨。 我遇到的問題是在我從 BeautifulSoup 獲取的超鏈接中搜索產品名稱。產品名稱將有一個空格，所以它實際上是 2 個單詞，我認為這就是導致問題的原因。

** product_name 傳入，例如：“Blue Truck” 示例鏈接： <a href="https://example.com/products/">Blue Truck</a>

soup = BeautifulSoup(driver.page_source, 'html.parser')
print("Trying to find links " + threadName)
for a in soup.findAll('a'):
     if product_name in a['href']:
        email_link(a)
        print("FOUND" + threadName)
        break
     elif product_name.lower() in a['href']:
        email_link(a)
        print("FOUND" + threadName)
        break

運行此代碼時，它永遠不會返回匹配項。 我也試過：

 if (a.find(product_name) != -1):
    email_link(a)

此 find() 返回了錯誤的匹配項。 任何幫助都會很棒，或者建議哪種方式最快。

Answer 1

a標簽"<a href="https://example.com/products/">Blue Truck</a>具有以下屬性：

href ：“https://example.com/products/”
innerHTML或text ：藍色卡車

該代碼正在尋找a['href'] ，它是"https://example.com/products/" 。 你想尋找a.text ，它是Blue Truck

Answer 2

你應該實現如下所示：

import bs4 as bs
import urllib.parse

soup = bs.BeautifulSoup(driver.page_source, 'html.parser')
print("Trying to find link for " + thread_name)
for a in soup.find_all('a'):
    if (product_name.lower() in a.text.lower()) or (urllib.parse.quote(product_name.lower()) in a['href']): # can also add regex
        email_link(a)
        print("FOUND" + thread_name)
        break

在超鏈接中搜索詞 Python 3

問題描述

2 個解決方案

解決方案1
0 2020-09-21 19:54:36

解決方案2
0 2020-09-21 20:05:18

在超鏈接中搜索詞 Python 3

問題描述

2 個解決方案

解決方案1 0 2020-09-21 19:54:36

解決方案2 0 2020-09-21 20:05:18

解決方案1
0 2020-09-21 19:54:36

解決方案2
0 2020-09-21 20:05:18