我正在嘗試抓取網站的鏈接，並在已經抓取的鏈接中抓取鏈接

Question

我正在嘗試抓取網站的鏈接，抓取后，我還想查看我抓取的鏈接是否只是文章或包含更多鏈接，如果是，我也想抓取這些鏈接。 我正在嘗試使用 BeautifulSoup 4 來實現它，這就是我到目前為止所擁有的代碼：

import requests
from bs4 import BeautifulSoup
url ='https://www.lbbusinessjournal.com/'
try:
    r = requests.get(url, headers={'User-Agent': user_agent})
    soup = BeautifulSoup(r.text, 'html.parser')
    for post in soup.find_all(['h3', 'li'], class_=['entry-title td-module-title', 'menu-item']):
        link = post.find('a').get('href')
        print(link)
        r = requests.get(link, headers={'User-Agent': user_agent})
        soup1 = BeautifulSoup(r.text, 'html.parser')
        for post1 in soup1.find_all('h3', class_='entry-title td-module-title'):
            link1 = post1.find('a').get('href')
            print(link1)
except Exception as e:
    print(e)

我想要頁面https://www.lbbusinessjournal.com/上的鏈接，並在我從該頁面獲得的鏈接中搜索可能的鏈接，例如https://www.lbbusinessjournal.com/news/ ，我想要這些鏈接在https://www.lbbusinessjournal.com/news/內也是如此。 到目前為止，我只從主頁獲取鏈接。

Answer 1

嘗試從您的except子句中raise e ，您將看到錯誤

AttributeError: 'NoneType' object 沒有屬性 'get'

來自行link1 = post1.find('a').get('href') ，其中post1.find('a')返回None - 這是因為您檢索的 HTML h3元素中至少有一個沒有一個a元素 - 事實上，看起來鏈接在 HTML 中被注釋掉了。

相反，您應該將此post1.find('a').get('href')調用拆分為兩個步驟，並在嘗試獲取'href'之前檢查post1.find('a')返回的元素是否不是None 'href'屬性，即：

for post1 in soup1.find_all('h3', class_='entry-title td-module-title'):                                                     
    element = post1.find('a')                                           
    if element is not None:                                             
        link1 = element.get('href')                                     
        print(link1)

Output 通過以下更改運行您的代碼：

https://www.lbbusinessjournal.com/
https://www.lbbusinessjournal.com/this-virus-doesnt-have-borders-port-official-warns-of-pandemics-future-economic-impact/
https://www.lbbusinessjournal.com/pharmacy-and-grocery-store-workers-call-for-increased-protections-against-covid-19/
https://www.lbbusinessjournal.com/up-close-and-personal-grooming-businesses-struggle-in-times-of-social-distancing/
https://www.lbbusinessjournal.com/light-at-the-end-of-the-tunnel-long-beach-secures-contract-for-new-major-convention/
https://www.lbbusinessjournal.com/hospitals-prepare-for-influx-of-coronavirus-patients-officials-worry-it-wont-be-enough/
https://www.lbbusinessjournal.com/portside-keeping-up-with-the-port-of-long-beach-18/
https://www.lbbusinessjournal.com/news/
...

我正在嘗試抓取網站的鏈接，並在已經抓取的鏈接中抓取鏈接

問題描述

1 個解決方案

解決方案1
1 已采納 2020-04-09 04:44:43

我正在嘗試抓取網站的鏈接，並在已經抓取的鏈接中抓取鏈接

問題描述

1 個解決方案

解決方案1 1 已采納 2020-04-09 04:44:43

解決方案1
1 已采納 2020-04-09 04:44:43