简体   繁体   English

nonetype object 抓取数据时没有属性文本错误

[英]nonetype object has no attribute text error while scraping data

when i try to scrap data from this amazon link .当我尝试从这个亚马逊链接中删除数据时。 I got AttributeError: 'NoneType' object has no attribute 'text'我得到AttributeError: 'NoneType' object has no attribute 'text'

My Code:我的代码:

headers = ({'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0',
       'Accept-Language' : 'en-US,en;q=0.5'})
lap_site = requests.get('https://www.amazon.in/s?k=laptops&sprefix=%2Caps%2C634&ref=nb_sb_ss_recent_3_0_recent',headers = headers)
lap_soup = bs(lap_site.content,'lxml')
content = lap_soup.find('div',class_ = 's-desktop-width-max s-desktop-content s-opposite-dir sg-row')
lap_detail_block = content.find_all('div',class_ = 'a-section a-spacing-small a-spacing-top-small')
lap_name = lap_price = lap_rating = []
for i in lap_detail_block:

   laptop_name = i.find('h2').a.span.text
   lap_name.append(laptop_name)

   laptop_rating = i.find('span',class_ = 'a-icon-alt').text
   lap_rating.append(laptop_rating)

   laptop_price = i.find('span',class_ = 'a-price-whole').text   
   lap_price.append(laptop_price)

laptop_details = {
'Laptop':lap_name,
'Price':lap_price,
'Rating':lap_rating }

print(laptop_details)

I think that the laptop_rating variable store the content in string format even if we not include .text .我认为即使我们不包含.textlaptop_rating变量也会以字符串格式存储内容。 I'm thinking that might be the reason for getting NoneType error, as we are extracting text from text.我认为这可能是出现NoneType错误的原因,因为我们正在从文本中提取文本。 Anyway that's not the issue.无论如何,这不是问题。 How to extract the price or rating from that link?如何从该链接中提取价格或评级?

At least from my tests, that page is recognizing automated access and blocks it.至少从我的测试来看,该页面正在识别自动访问并阻止它。 You need to use something like cloudscraper to do it.你需要使用cloudscraper之类的东西来做到这一点。 The following code will return the expected results (adapt to your own circumstances):下面的代码会返回预期的结果(适应自己的情况):

import cloudscraper
import pandas as pd
from bs4 import BeautifulSoup

scraper = cloudscraper.create_scraper()

r = scraper.get('https://www.amazon.in/s?k=laptops&sprefix=%2Caps%2C634&ref=nb_sb_ss_recent_3_0_recent')
soup = BeautifulSoup(r.content, 'html.parser')
# print(soup)
content = soup.find('div',class_ = 's-desktop-width-max s-desktop-content s-opposite-dir sg-row')
lap_detail_block = content.find_all('div',class_ = 'a-section a-spacing-small a-spacing-top-small')
lap_name = lap_price = lap_rating = []
for i in lap_detail_block:
    try:
       laptop_name = i.find('h2').a.span.text
       lap_name.append(laptop_name)

       laptop_rating = i.find('span',class_ = 'a-icon-alt').text
       lap_rating.append(laptop_rating)

       laptop_price = i.find('span',class_ = 'a-price-whole').text   
       lap_price.append(laptop_price)

       laptop_details = {
        'Laptop':lap_name,
        'Price':lap_price,
        'Rating':lap_rating 
       }
       print(laptop_name, laptop_rating, laptop_price)
    except Exception as e:
        print(e)
    print('_____________')

This will print out in terminal:这将在终端打印出来:

HP 15s, 12th Gen Intel Core i5 8GB RAM/512GB SSD 15.6-inch(39.6 cm) FHD,Micro-Edge, Anti- Glare Display/Win 11/Intel Iris Xe Graphics/Dual Speakers/Alexa/Backlit KB/MSO/Fast Charge, 15s- fq5111TU 4.2 out of 5 stars 58,699
_____________
Acer Predator Helios 500 Gaming Laptop (11Th Gen Intel Core I9/17.3 Inches 4K Uhd Display/64Gb Ddr4 Ram/2Tb Ssd/1Tb HDD/RTX 3080 Graphics/Windows 10 Home/Per Key RGB Backlit Keyboard) Ph517-52 3.0 out of 5 stars 3,79,990
_____________
ASUS VivoBook 14 (2021), 14-inch (35.56 cm) HD, Intel Core i3-1005G1 10th Gen, Thin and Light Laptop (8GB/1TB HDD/Windows 11/Integrated Graphics/Grey/1.6 kg), X415JA-BV301W 3.8 out of 5 stars 27,990
_____________
[...]

Cloudscraper's details and install instructions: https://pypi.org/project/cloudscraper/ Cloudscraper 的详细信息和安装说明: https://pypi.org/project/cloudscraper/

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AttributeError: 'NoneType' object 在抓取时没有属性 'text' - AttributeError: 'NoneType' object has no attribute 'text' while scraping 在亚马逊网页抓取时在 BS4 中收到错误:AttributeError: 'NoneType' 对象没有属性 'get_text' - Receiving an error in BS4 while amazon web scraping : AttributeError: 'NoneType' object has no attribute 'get_text' 通过美丽的汤 4 python 抓取时,错误 nonetype 对象没有属性文本 - Error nonetype object has no attribute text while scraping via beautiful soup 4 python 使用 Selenium BeautifulSoup 进行 Web 抓取的 .text.strip() 上的错误(AttributeError:'NoneType' 对象没有属性 'text) - Error on .text.strip() using Selenium BeautifulSoup for Web-scraping (AttributeError: 'NoneType' object has no attribute 'text) 获取 AttributeError: 'NoneType' 对象没有属性 'text'(网络抓取) - Getting AttributeError: 'NoneType' object has no attribute 'text' (web-scraping) python - 'AttributeError: 'NoneType' object 在 web 抓取时没有属性 'text' - python - 'AttributeError: 'NoneType' object has no attribute 'text' when web scraping 用 python 抓取网页('NoneType' 对象没有属性 'get_text') - Web scraping with python ('NoneType' object has no attribute 'get_text') 使用 Python3 / AttributeError 抓取网站:'NoneType' object 没有属性 'text' - Scraping website with Python3 / AttributeError: 'NoneType' object has no attribute 'text' 网页抓取错误:“NoneType”object 没有属性“文本” - Webscraping Error: 'NoneType' object has no attribute 'text' 错误:AttributeError:'NoneType' 对象没有属性 'text' - Error: AttributeError: 'NoneType' object has no attribute 'text'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM