如何使用 python 中的 BeautifulSoup 从以下链接返回日期和标题

Question

I want to extract data from the link below using BeautifulSoup package in python I am trying to get all the links of the first page and then get all the related data of each link我想使用 python 中的 BeautifulSoup package 从下面的链接中提取数据我正在尝试获取第一页的所有链接，然后获取每个链接的所有相关数据

example as: publish_date & title例如：发布日期和标题

but the system crash and display the below error:但系统崩溃并显示以下错误：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-95-0fd35627bc48> in <module>
     52     s = BeautifulSoup(requests.get(link).content, "lxml")
     53 
---> 54     date_published = s.find("span", class_="t-mute").getText(strip=True)
     55     title = s.find("h1", class_="h3 t-break").getText(strip=True)
     56     print(f"{date_published} {title}\n\n", "-" * 80)

AttributeError: 'NoneType' object has no attribute 'getText'

================================== ====================================

import time
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
    "lxml"
)


links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])


for link in links:
    
    s = BeautifulSoup(requests.get(link).content, "lxml")

    date_published = s.find("span", class_="t-mute").getText(strip=True)
    title = s.find("h1", class_="h3 t-break").getText(strip=True)
    print(f"{date_published} {title}\n\n", "-" * 80)

Answer 1

you search the wrong element你搜索了错误的元素


import time
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
    "lxml"
)


links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])


for link in links:
    s = BeautifulSoup(requests.get(link).content, "lxml")

    date_info = s.find_all("li", class_="t-mute")[-1]
    date_published = date_info.find("span", class_="u-none").getText(strip=True)

    title = s.find("h1", class_="h3 t-break").getText(strip=True)
    print(f"{date_published} {title}\n\n", "-" * 80)

Answer 2

Read the error message:阅读错误信息：

 AttributeError: 'NoneType' object has no attribute 'getText'

means that s.find("span", class_="t-mute") is None , ie no result was found.表示s.find("span", class_="t-mute")是None ，即没有找到结果。

Said otherwise: page structure/tags are not as expected.否则说：页面结构/标签不符合预期。

So either:所以要么：

fix your search criteria修正您的搜索条件
or test for None -ity after your search, before calling getText method.或在搜索之后测试None -ity，然后再调用getText方法。

如何使用 python 中的 BeautifulSoup 从以下链接返回日期和标题

问题描述

2 个解决方案

解决方案1
1 2021-01-26 14:04:39

解决方案2
1 2021-01-26 14:24:40

如何使用 python 中的 BeautifulSoup 从以下链接返回日期和标题

问题描述

2 个解决方案

解决方案1 1 2021-01-26 14:04:39

解决方案2 1 2021-01-26 14:24:40

解决方案1
1 2021-01-26 14:04:39

解决方案2
1 2021-01-26 14:24:40