简体   繁体   English

获取 AttributeError: 'NoneType' 对象没有属性 'text'(网络抓取)

[英]Getting AttributeError: 'NoneType' object has no attribute 'text' (web-scraping)

This is my case study about web scraping.这是我关于网络抓取的案例研究。 I got a problem in the final code 'NoneType' object has no attribute 'text' so I tried to fix it with 'getattr' function but it didn't work.我在最终代码中遇到了一个问题“NoneType”对象没有属性“text”,所以我试图用“getattr”函数修复它,但它没有用。

''' '''

import requests
from bs4 import BeautifulSoup

url = 'https://www.birdsnest.com.au/womens/dresses'

source = requests.get(url)
soup = BeautifulSoup(source.content, 'lxml')

''' '''

productlist= soup.find_all('div', id='items')

''' '''

productlinks = []
for item in productlist:
  for link in item.find_all('a',href=True):
      productlinks.append(url + link['href'])
print(len(productlinks))

''' '''

productlinks = []
for x in range(1,28):
  source = requests.get(f'https://www.birdsnest.com.au/womens/dresses?_lh=1&page={x}')
  soup = BeautifulSoup(source.content, 'lxml')
  for item in productlist:
      for link in item.find_all('a',href=True):
        productlinks.append(url + link['href'])
print(productlinks)

''' '''

for link in productlinks:
    source = requests.get(link)
    soup = BeautifulSoup(source.content, 'lxml')

    name = soup.find('h1',class_='item-heading__name').text.strip()
    price = soup.find('p',class_='item-heading__price').text.strip()
    feature = soup.find('div',class_='tab-accordion__content active').text.strip()

    sum = {
      'name':name,
      'price':price,
      'feature':feature
          }
    print(sum)

''' '''

  ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-7-d4d46558690d> in <module>()
      3     soup = BeautifulSoup(source.content, 'lxml')
      4 
----> 5     name = soup.find('h1',class_='item-heading__name').text.strip()
      6     price = soup.find('p',class_='item-heading__price').text.strip()
      7     feature = soup.find('div',class_='tab-accordion__content active').text.strip()

AttributeError: 'NoneType' object has no attribute 'text'

---------------------------------------------------------------------------

So I tried to fix with this method, but it didn't work.所以我试图用这种方法修复,但它没有用。

 for link in productlinks:
    source = requests.get(link)
    soup = BeautifulSoup(source.content, 'lxml')

    name = getattr(soup.find('h1',class_='item-heading__name'),'text',None)
    price = getattr(soup.find('p',class_='item-heading__price'),'text',None)
    feature = getattr(soup.find('div',class_='tab-accordion__content active'),'text',None)

    sum = {
      'name':name,
      'price':price,
      'feature':feature
          }
    print(sum)

This is the output.这是输出。 It show only 'Nonetype'它只显示“Nonetype”

{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}
{'name': None, 'price': None, 'feature': None}

First of all, always turn JS off for the page you're scraping.首先,始终为您正在抓取的页面关闭JS Then you'll realize that tag classes change and these are the ones you want to target.然后你会意识到标签类发生了变化,这些是你想要定位的。

Also, when looping through the pages, don't forget that Python's range() stop value is not inclusive.此外,在循环浏览页面时,不要忘记 Python 的range()停止值不包括在内。 Meaning, this range(1, 28) will stop on page 27 .意思是,这个range(1, 28)将在第27页停止。

Here's how I would go about it:这是我将如何去做:

import json

import requests
from bs4 import BeautifulSoup


cookies = {
    "ServerID": "1033",
    "__zlcmid": "10tjXhWpDJVkUQL",
}

headers = {
    "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
                  "(KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"
}


def extract_info(bs: BeautifulSoup, tag: str, attr_value: str) -> list:
    return [i.text.strip() for i in bs.find_all(tag, {"itemprop": attr_value})]


all_pages = []
for page in range(1, 29):
    print(f"Scraping data from page {page}...")

    current_page = f"https://www.birdsnest.com.au/womens/dresses?page={page}"
    source = requests.get(current_page, headers=headers, cookies=cookies)
    soup = BeautifulSoup(source.content, 'html.parser')

    brand = extract_info(soup, tag="strong", attr_value="brand")
    name = extract_info(soup, tag="h2", attr_value="name")
    price = extract_info(soup, tag="span", attr_value="price")

    all_pages.extend(
        [
            {
                "brand": b,
                "name": n,
                "price": p,
            } for b, n, p in zip(brand, name, price)
        ]
    )

print(f"{all_pages}\nFound: {len(all_pages)} dresses.")

with open("all_the_dresses2.json", "w") as jf:
    json.dump(all_pages, jf, indent=4)

This gets you a JSON with all the dresses.这将为您提供包含所有连衣裙的JSON

    {
        "brand": "boho bird",
        "name": "Prissy Dress",
        "price": "$189.95"
    },
    {
        "brand": "boho bird",
        "name": "Dandelion Dress",
        "price": "$139.95"
    },
    {
        "brand": "Lula Soul",
        "name": "Dandelion Dress",
        "price": "$179.95"
    },
    {
        "brand": "Honeysuckle Beach",
        "name": "Cotton V-Neck A-Line Splice Dress",
        "price": "$149.95"
    },
    {
        "brand": "Honeysuckle Beach",
        "name": "Lenny Pinafore",
        "price": "$139.95"
    },
and so on for the next 28 pages ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Selenium BeautifulSoup 进行 Web 抓取的 .text.strip() 上的错误(AttributeError:&#39;NoneType&#39; 对象没有属性 &#39;text) - Error on .text.strip() using Selenium BeautifulSoup for Web-scraping (AttributeError: 'NoneType' object has no attribute 'text) AttributeError: &#39;NoneType&#39; 对象在 beautifulsoop web-scraping 中没有属性 &#39;get_text&#39; - AttributeError: 'NoneType' object has no attribute 'get_text' in beautifulsoop web-scraping AttributeError: &#39;NoneType&#39; 对象没有属性 &#39;get_text&#39; python web-scraping - AttributeError: 'NoneType' object has no attribute 'get_text' python web-scraping 使用漂亮的汤在 python 中进行网络抓取:AttributeError: 'NoneType' object 没有属性 'text' - web-scraping in python using beautiful soup: AttributeError: 'NoneType' object has no attribute 'text' python - 'AttributeError: 'NoneType' object 在 web 抓取时没有属性 'text' - python - 'AttributeError: 'NoneType' object has no attribute 'text' when web scraping AttributeError: 'NoneType' object 没有属性 'find' Web 抓取 Python - AttributeError: 'NoneType' object has no attribute 'find' Web Scraping Python 使用 Python3 / AttributeError 抓取网站:'NoneType' object 没有属性 'text' - Scraping website with Python3 / AttributeError: 'NoneType' object has no attribute 'text' AttributeError: 'NoneType' object 在抓取时没有属性 'text' - AttributeError: 'NoneType' object has no attribute 'text' while scraping 在亚马逊网页抓取时在 BS4 中收到错误:AttributeError: &#39;NoneType&#39; 对象没有属性 &#39;get_text&#39; - Receiving an error in BS4 while amazon web scraping : AttributeError: 'NoneType' object has no attribute 'get_text' 使用BeautifulSoup的Python Web爬网AttributeError:&#39;NoneType&#39;对象没有属性&#39;text&#39; - Python Web Scraping using BeautifulSoup AttributeError: 'NoneType' object has no attribute 'text'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM