運行 web 抓取腳本時獲取不可下標

Question

我正在練習 web 抓取並使用此代碼。 我正在嘗試 for 循環。

import requests
from bs4 import BeautifulSoup

name=[]
link=[]
address=[]
for i in range (1,11):
  i=str(i)
  url = "https://forum.iktva.sa/exhibitors-list?&page="+i+"&searchgroup=37D5A2A4-exhibitors"
  soup = BeautifulSoup(requests.get(url).content, "html.parser")

  for a in soup.select(".m-exhibitors-list__items__item__header__title__link"):
      company_url = "https://forum.iktva.sa/" + a["href"].split("'")[1]

      soup2 = BeautifulSoup(requests.get(company_url).content, "html.parser")
      n=soup2.select_one(".m-exhibitor-entry__item__header__title").text

      l=soup2.select_one("h4+a")["href"]
      a=soup2.select_one(".m-exhibitor-entry__item__body__contacts__address").text
      name.append(n)
      link.append(l)
      address.append(a)

當我運行程序時，我收到此錯誤：

  l=soup2.select_one("h4+a")["href"]
TypeError: 'NoneType' object is not subscriptable

如果我不確定如何解決問題。

Answer 1

您只需要 raplace，按照代碼處理 None

l = soup2.select_one("h4+a")
if l:
    l = l["href"]
else:
    l = "Website not available"

如您所見，因為網站不適用於： https://forum.iktva.sa/exhibitors/sanad

或者您可以處理所有錯誤，例如：

import requests
from bs4 import BeautifulSoup


def get_object(obj, attr=None):
    try:
        if attr:
            return obj[attr]
        else:
            return obj.text
    except:
        return "Not available"


name = []
link = []
address = []
for i in range(1, 11):
    i = str(i)
    url = f"https://forum.iktva.sa/exhibitors-list?&page={i}&searchgroup=37D5A2A4-exhibitors"
    soup = BeautifulSoup(requests.get(url).text, features="lxml")

    for a in soup.select(".m-exhibitors-list__items__item__header__title__link"):

        company_url = "https://forum.iktva.sa/" + a["href"].split("'")[1]
        soup2 = BeautifulSoup(requests.get(company_url).content, "html.parser")

        n = soup2.select_one(".m-exhibitor-entry__item__header__title").text
        n = get_object(n)

        l = soup2.select_one("h4+a")
        l = get_object(l, 'href')

        a = soup2.select_one(".m-exhibitor-entry__item__body__contacts__address")
        a = get_object(a)

        name.append(n)
        link.append(l)
        address.append(a)

運行 web 抓取腳本時獲取不可下標

問題描述

1 個解決方案

解決方案1
0 已采納 2022-01-31 06:47:04

運行 web 抓取腳本時獲取不可下標

問題描述

1 個解決方案

解決方案1 0 已采納 2022-01-31 06:47:04

解決方案1
0 已采納 2022-01-31 06:47:04