简体   繁体   English

如何使用 python 中的 BeautifulSoup 从以下链接返回日期和标题

[英]how to return date and title from the below link using BeautifulSoup in python

I want to extract data from the link below using BeautifulSoup package in python I am trying to get all the links of the first page and then get all the related data of each link我想使用 python 中的 BeautifulSoup package 从下面的链接中提取数据我正在尝试获取第一页的所有链接,然后获取每个链接的所有相关数据

example as: publish_date & title例如:发布日期和标题

but the system crash and display the below error:但系统崩溃并显示以下错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-95-0fd35627bc48> in <module>
     52     s = BeautifulSoup(requests.get(link).content, "lxml")
     53 
---> 54     date_published = s.find("span", class_="t-mute").getText(strip=True)
     55     title = s.find("h1", class_="h3 t-break").getText(strip=True)
     56     print(f"{date_published} {title}\n\n", "-" * 80)

AttributeError: 'NoneType' object has no attribute 'getText'

================================== ====================================

import time
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
    "lxml"
)


links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])


for link in links:
    
    s = BeautifulSoup(requests.get(link).content, "lxml")

    date_published = s.find("span", class_="t-mute").getText(strip=True)
    title = s.find("h1", class_="h3 t-break").getText(strip=True)
    print(f"{date_published} {title}\n\n", "-" * 80)

you search the wrong element你搜索了错误的元素


import time
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
    "lxml"
)


links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])


for link in links:
    s = BeautifulSoup(requests.get(link).content, "lxml")

    date_info = s.find_all("li", class_="t-mute")[-1]
    date_published = date_info.find("span", class_="u-none").getText(strip=True)

    title = s.find("h1", class_="h3 t-break").getText(strip=True)
    print(f"{date_published} {title}\n\n", "-" * 80)

Read the error message:阅读错误信息:

 AttributeError: 'NoneType' object has no attribute 'getText'

means that s.find("span", class_="t-mute") is None , ie no result was found.表示s.find("span", class_="t-mute")None ,即没有找到结果。

Said otherwise: page structure/tags are not as expected.否则说:页面结构/标签不符合预期。

So either:所以要么:

  • fix your search criteria修正您的搜索条件
  • or test for None -ity after your search, before calling getText method.或在搜索之后测试None -ity,然后再调用getText方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 BeautifulSoup 从链接中提取标题 - extract title from a link using BeautifulSoup 使用BeautifulSoup提取链接的标题 - Using BeautifulSoup to extract the title of a link 如何使用python和beautifulsoup获取title属性? - How to obtain title attribute using python and beautifulsoup? Python 如何使用 BeautifulSoup 从这个链接刮取图像 URL 和图像标题 - Python How Can I Scrape Image URL and Image Title From This Link With BeautifulSoup 如何使用 BeautifulSoup 在 Python 中接收网站链接 - How to receive website link in Python using BeautifulSoup 如何在 Python 中使用 BeautifulSoup 创建链接? - How to create a link using BeautifulSoup in Python? 如何从beautifulsoup输出python读取链接 - How to read link from beautifulsoup output python 如果使用 beautifulsoup 找不到链接,Python IF 语句将返回“True” - Python IF statement to return 'True' if no link found using beautifulsoup 如何使用python和beautifulsoup对象从下面的代码中只抓取贡献者? - How to scrape only contributors from below code using python and beautifulsoup object? 您如何使用 beautifulsoup 或一些 Python 库从重定向链接中抓取或检索真实网站 URL? 下面的例子: - How do you use beautifulsoup or some Python library to scrape or retrieve the real website URL from a redirect link? Example below:
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM