![](/img/trans.png)
[英]AttributeError: 'NoneType' object has no attribute 'get_text' in beautifulsoop web-scraping
[英]Receiving an error in BS4 while amazon web scraping : AttributeError: 'NoneType' object has no attribute 'get_text'
!pip install requests
!pip install bs4
import requests
from bs4 import BeautifulSoup
url = "https://www.amazon.in/Apple-iPhone-Pro-Max-256GB/dp/B07XVLH744/ref=sr_1_1_sspa?crid=2VCKZNOH3H6SR&keywords=apple+iphone+11+pro+max&qid=1582043410&sprefix=apple+iphone%2Caps%2C388&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEyVjdZSE83TzU4UUMmZW5jcnlwdGVkSWQ9QTAyNTI1ODZJUzZOVUwxWDNIUlAmZW5jcnlwdGVkQWRJZD1BMDkxNDg4MzFLMFpVT1M5OFM5Q0smd2lkZ2V0TmFtZT1zcF9hdGYmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl"
headers = {"User-Agent": "in this section im adding my user agent after typing my user agent in google search"}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
print(soup.prettify())
title = soup.find(id = "productTitle").get_text()
price = soup.find(id = "priceblock_ourprice").get_text()
converted_price = price[0:8]
print(converted_price)
print(titles)
当我运行此代码时,我正在使用 google colab 出现此错误
AttributeError Traceback (most recent call last)
<ipython-input-15-14696d9dc778> in <module>()
16 print(soup.prettify())
17
---> 18 title = soup.find(id = "productTitle").get_text()
19 price = soup.find(id = "priceblock_ourprice").get_text()
20
AttributeError: 'NoneType' object has no attribute 'get_text'
我试过在互联网上搜索,但没有找到解决我问题的答案。 我想获得 iPhone 11 pro 的最高价格。 当我运行此代码时,出现上述错误。
soup.find(id = "productTitle")
这是返回None
因为它无法找到id = "producTitle"
。 确保您正在搜索正确的元素。
对于find
语句,我建议始终编写 if 条件来避免和处理此类错误。
title = soup.find(id = "productTitle")
if title:
title = title.get_text()
else:
title = "default_title"
price = soup.find(id = "priceblock_ourprice").get_text()
price
做同样的事情。当您尝试从值为 None 的对象中提取数据时,您会收到该错误。 如果您在第 18 行看到它,则表示您的soup.find(id = "productTitle")
没有匹配任何内容并返回 None。
您需要将处理分解为多个步骤。 在访问它之前首先检查返回值。 所以...
title_info = soup.find(id = "productTitle")
if title_info:
title = title_info.text
else:
'handle the situation'
好吧,我在这里测试了您的代码,它工作正常。 但是,当您尝试在短时间内访问同一链接时,亚马逊会为您提供 503 代码......
<html>
<head>
<title>
503 - Service Unavailable Error
</title>
</head>
<body bgcolor="#FFFFFF" text="#000000">
<!--
To discuss automated access to Amazon data please contact api-services-support@amazon.com.
For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.in/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.in/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.
-->
<center>
<a href="https://www.amazon.in/ref=cs_503_logo/">
<img alt="Amazon.in" border="0" height="45" src="https://images-eu.ssl-images-amazon.com/images/G/31/x-locale/communities/people/logo.gif" width="200"/>
</a>
<p align="center">
<font face="Verdana,Arial,Helvetica">
<font color="#CC6600" size="+2">
<b>
Oops!
</b>
</font>
<br/>
<b>
It's rush hour and traffic is piling up on that page. Please try again in a short while.
<br/>
If you were trying to place an order, it will not have been processed at this time.
</b>
<p>
<img alt="*" border="0" height="9" src="https://images-eu.ssl-images-amazon.com/images/G/02/x-locale/common/orange-arrow.gif" width="10"/>
<b>
<a href="https://www.amazon.in/ref=cs_503_link/">
Go to the Amazon.in home page to continue shopping
</a>
</b>
</p>
</font>
</p>
</center>
</body>
</html>
稍等片刻,然后再试一次,或者至少测试请求之间的时间更长......
也试试这个代码
title = soup.find(id="productTitle")
if title:
title = title.get_text()
else:
title = "default_title"
price = soup.find(id="priceblock_ourprice")
if price:
price = price
else:
price = "default_title"
# converted_price = price[0:8]
convert = str(price)
con = convert[-18:-11]
print(con)
print(title)
尝试使用另一个 IDE
使用 repl.it= https://repl.it创建一个新的 repl 并使用它
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.