簡體   English   中英

運行 web 抓取腳本時獲取不可下標

[英]Getting a not subscriptable when running a web scraping script

我正在練習 web 抓取並使用此代碼。 我正在嘗試 for 循環。

import requests
from bs4 import BeautifulSoup

name=[]
link=[]
address=[]
for i in range (1,11):
  i=str(i)
  url = "https://forum.iktva.sa/exhibitors-list?&page="+i+"&searchgroup=37D5A2A4-exhibitors"
  soup = BeautifulSoup(requests.get(url).content, "html.parser")

  for a in soup.select(".m-exhibitors-list__items__item__header__title__link"):
      company_url = "https://forum.iktva.sa/" + a["href"].split("'")[1]

      soup2 = BeautifulSoup(requests.get(company_url).content, "html.parser")
      n=soup2.select_one(".m-exhibitor-entry__item__header__title").text

      l=soup2.select_one("h4+a")["href"]
      a=soup2.select_one(".m-exhibitor-entry__item__body__contacts__address").text
      name.append(n)
      link.append(l)
      address.append(a)

當我運行程序時,我收到此錯誤:

  l=soup2.select_one("h4+a")["href"]
TypeError: 'NoneType' object is not subscriptable

如果我不確定如何解決問題。

您只需要 raplace,按照代碼處理 None

l = soup2.select_one("h4+a")
if l:
    l = l["href"]
else:
    l = "Website not available"

如您所見,因為網站不適用於: https://forum.iktva.sa/exhibitors/sanad

或者您可以處理所有錯誤,例如:

import requests
from bs4 import BeautifulSoup


def get_object(obj, attr=None):
    try:
        if attr:
            return obj[attr]
        else:
            return obj.text
    except:
        return "Not available"


name = []
link = []
address = []
for i in range(1, 11):
    i = str(i)
    url = f"https://forum.iktva.sa/exhibitors-list?&page={i}&searchgroup=37D5A2A4-exhibitors"
    soup = BeautifulSoup(requests.get(url).text, features="lxml")

    for a in soup.select(".m-exhibitors-list__items__item__header__title__link"):

        company_url = "https://forum.iktva.sa/" + a["href"].split("'")[1]
        soup2 = BeautifulSoup(requests.get(company_url).content, "html.parser")

        n = soup2.select_one(".m-exhibitor-entry__item__header__title").text
        n = get_object(n)

        l = soup2.select_one("h4+a")
        l = get_object(l, 'href')

        a = soup2.select_one(".m-exhibitor-entry__item__body__contacts__address")
        a = get_object(a)

        name.append(n)
        link.append(l)
        address.append(a)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM