[英]how to return date and title from the below link using BeautifulSoup in python
I want to extract data from the link below using BeautifulSoup package in python I am trying to get all the links of the first page and then get all the related data of each link我想使用 python 中的 BeautifulSoup package 从下面的链接中提取数据我正在尝试获取第一页的所有链接,然后获取每个链接的所有相关数据
example as: publish_date & title例如:发布日期和标题
but the system crash and display the below error:但系统崩溃并显示以下错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-95-0fd35627bc48> in <module>
52 s = BeautifulSoup(requests.get(link).content, "lxml")
53
---> 54 date_published = s.find("span", class_="t-mute").getText(strip=True)
55 title = s.find("h1", class_="h3 t-break").getText(strip=True)
56 print(f"{date_published} {title}\n\n", "-" * 80)
AttributeError: 'NoneType' object has no attribute 'getText'
================================== ====================================
import time
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(
requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
"lxml"
)
links = []
for a in soup.select("h2.m0.t-regular a"):
if a['href'] not in links:
links.append("https://www.bayt.com"+ a['href'])
for link in links:
s = BeautifulSoup(requests.get(link).content, "lxml")
date_published = s.find("span", class_="t-mute").getText(strip=True)
title = s.find("h1", class_="h3 t-break").getText(strip=True)
print(f"{date_published} {title}\n\n", "-" * 80)
you search the wrong element你搜索了错误的元素
import time
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(
requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
"lxml"
)
links = []
for a in soup.select("h2.m0.t-regular a"):
if a['href'] not in links:
links.append("https://www.bayt.com"+ a['href'])
for link in links:
s = BeautifulSoup(requests.get(link).content, "lxml")
date_info = s.find_all("li", class_="t-mute")[-1]
date_published = date_info.find("span", class_="u-none").getText(strip=True)
title = s.find("h1", class_="h3 t-break").getText(strip=True)
print(f"{date_published} {title}\n\n", "-" * 80)
Read the error message:阅读错误信息:
AttributeError: 'NoneType' object has no attribute 'getText'
means that s.find("span", class_="t-mute")
is None
, ie no result was found.表示
s.find("span", class_="t-mute")
是None
,即没有找到结果。
Said otherwise: page structure/tags are not as expected.否则说:页面结构/标签不符合预期。
So either:所以要么:
None
-ity after your search, before calling getText
method.None
-ity,然后再调用getText
方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.