繁体   English   中英

在亚马逊网页抓取时在 BS4 中收到错误:AttributeError: 'NoneType' 对象没有属性 'get_text'

[英]Receiving an error in BS4 while amazon web scraping : AttributeError: 'NoneType' object has no attribute 'get_text'

!pip install requests
!pip install bs4


import requests
from bs4 import BeautifulSoup

url = "https://www.amazon.in/Apple-iPhone-Pro-Max-256GB/dp/B07XVLH744/ref=sr_1_1_sspa?crid=2VCKZNOH3H6SR&keywords=apple+iphone+11+pro+max&qid=1582043410&sprefix=apple+iphone%2Caps%2C388&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEyVjdZSE83TzU4UUMmZW5jcnlwdGVkSWQ9QTAyNTI1ODZJUzZOVUwxWDNIUlAmZW5jcnlwdGVkQWRJZD1BMDkxNDg4MzFLMFpVT1M5OFM5Q0smd2lkZ2V0TmFtZT1zcF9hdGYmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl"

headers = {"User-Agent": "in this section im adding my user agent after typing my user agent in google search"}

page = requests.get(url, headers=headers)

soup = BeautifulSoup(page.content, "html.parser")

print(soup.prettify()) 

title = soup.find(id = "productTitle").get_text()
price = soup.find(id = "priceblock_ourprice").get_text()

converted_price = price[0:8]

print(converted_price)
print(titles)

当我运行此代码时,我正在使用 google colab 出现此错误

AttributeError   Traceback (most recent call last)
<ipython-input-15-14696d9dc778> in <module>()
     16 print(soup.prettify())
     17 
---> 18 title = soup.find(id = "productTitle").get_text()
     19 price = soup.find(id = "priceblock_ourprice").get_text()
     20 

AttributeError: 'NoneType' object has no attribute 'get_text'

我试过在互联网上搜索,但没有找到解决我问题的答案。 我想获得 iPhone 11 pro 的最高价格。 当我运行此代码时,出现上述错误。

  • soup.find(id = "productTitle")这是返回None因为它无法找到id = "producTitle" 确保您正在搜索正确的元素。

  • 对于find语句,我建议始终编写 if 条件来避免和处理此类错误。

title = soup.find(id = "productTitle")
if title:
    title = title.get_text()
else:
    title = "default_title"

price = soup.find(id = "priceblock_ourprice").get_text()
  • 你可以对price做同样的事情。

当您尝试从值为 None 的对象中提取数据时,您会收到该错误。 如果您在第 18 行看到它,则表示您的soup.find(id = "productTitle")没有匹配任何内容并返回 None。

您需要将处理分解为多个步骤。 在访问它之前首先检查返回值。 所以...

title_info = soup.find(id = "productTitle")
if title_info:
    title = title_info.text
else:
    'handle the situation'

好吧,我在这里测试了您的代码,它工作正常。 但是,当您尝试在短时间内访问同一链接时,亚马逊会为您提供 503 代码......

<html>
 <head>
  <title>
   503 - Service Unavailable Error
  </title>
 </head>
 <body bgcolor="#FFFFFF" text="#000000">
  <!--
        To discuss automated access to Amazon data please contact api-services-support@amazon.com.
        For information about migrating to our APIs refer to our Marketplace APIs at https://developer.amazonservices.in/ref=rm_5_sv, or our Product Advertising API at https://affiliate-program.amazon.in/gp/advertising/api/detail/main.html/ref=rm_5_ac for advertising use cases.
-->
  <center>
   <a href="https://www.amazon.in/ref=cs_503_logo/">
    <img alt="Amazon.in" border="0" height="45" src="https://images-eu.ssl-images-amazon.com/images/G/31/x-locale/communities/people/logo.gif" width="200"/>
   </a>
   <p align="center">
    <font face="Verdana,Arial,Helvetica">
     <font color="#CC6600" size="+2">
      <b>
       Oops!
      </b>
     </font>
     <br/>
     <b>
      It's rush hour and traffic is piling up on that page. Please try again in a short while.
      <br/>
      If you were trying to place an order, it will not have been processed at this time.
     </b>
     <p>
      <img alt="*" border="0" height="9" src="https://images-eu.ssl-images-amazon.com/images/G/02/x-locale/common/orange-arrow.gif" width="10"/>
      <b>
       <a href="https://www.amazon.in/ref=cs_503_link/">
        Go to the Amazon.in home page to continue shopping
       </a>
      </b>
     </p>
    </font>
   </p>
  </center>
 </body>
</html>

稍等片刻,然后再试一次,或者至少测试请求之间的时间更长......

也试试这个代码

    title = soup.find(id="productTitle")
     if title:
       title = title.get_text()
     else:
       title = "default_title"
    price = soup.find(id="priceblock_ourprice")
      if price:
       price = price
      else:
       price = "default_title"

        # converted_price = price[0:8]
       convert = str(price)
       con = convert[-18:-11]

        print(con)
        print(title)

尝试使用另一个 IDE

使用 repl.it= https://repl.it创建一个新的 repl 并使用它

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM