网页抓取如何将不可用的数据保存为空值

Question

hi i am trying to get data with web scraping but my code gets untill "old_price" = null how can i skip this data if it is empty or how can i read it and save unavailable as a null this is my python code嗨，我正在尝试通过网络抓取获取数据，但我的代码直到“old_price”= null如果它为空，我如何跳过此数据，或者我如何读取它并将不可用的数据保存为 null，这是我的 Python 代码

  import requests
  import json
  from bs4 import BeautifulSoup
  
  header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'}
  
  base_url = "https://www.n11.com/super-firsatlar"
  
  r = requests.get(base_url,headers=header)
  
  if r.status_code == 200:
    soup = BeautifulSoup(r.text, 'html.parser')
    books = soup.find_all('li',attrs={"class":"column"})
  
    result=[]
    for book in books:
    title=book.find('h3').text.strip()
    link=base_url +book.find('a')['href']
    picture = base_url + book.find('img')['src']
    first_price = book.find('a', attrs={'class': 'newPrice'}).find('ins').text[:10].strip().strip()+" TL"
    old_price = book.find('a', attrs={'class': 'oldPrice'}).find('del').text.strip()
    single ={'title':title,'link':link,'picture':picture,'first_price':first_price,'old_price':old_price}
    result.append(single)
    with open('book.json','w', encoding='utf-8') as f:
    json.dump(result ,f,indent=4,ensure_ascii=False)
  else:
     print(r.status_code)

 <div class="proDetail"> <a href="https://test.com"class="oldPrice" title="Premium"> <del>69,00 TL</del></a> <a href="https://test.com"class="newPrice" title="Premium"> <ins>14,90</ins> </a> <a href="https://test.com"class="newPrice" title="Premium"> <ins>19,90</ins> </a> </div> <a href="https://test.com"class="oldPrice" title="Premium"> <del>79,00 TL</del></a> <a href="https://test.com"class="newPrice" title="Premium"> <ins>34,90</ins> </a>

and this is my error这是我的错误

   File "C:\Users\Red\Desktop\webcrawler-tutorial-master\hepsiburada\main.py", line 22, in <module>
 old_price = book.find('a', attrs={'class': 'oldPrice'}).find('del').text.strip()
 AttributeError: 'NoneType' object has no attribute 'find'

Answer 1

The good practice in scraping for the name, Price, links we need to have a good error handling for each of the fields we're scraping.抓取名称、价格、链接的良好实践，我们需要对我们抓取的每个字段进行良好的错误处理。 Something like below像下面这样

import requests
import json
from bs4 import BeautifulSoup

header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'}

base_url = "https://www.n11.com/super-firsatlar"

r = requests.get(base_url,headers=header)

if r.status_code == 200:
  soup = BeautifulSoup(r.text, 'html.parser')
  books = soup.find_all('li',attrs={"class":"column"})

  result=[]
  for book in books:
    title=book.find('h3').text.strip()
    link=base_url +book.find('a')['href']
    picture = base_url + book.find('img')['src']
    first_price = book.find('a', attrs={'class': 'newPrice'}).find('ins').text[:10].strip().strip()+" TL"
    if book.find('a', attrs={'class': 'oldPrice'}):
       old_price = book.find('a', attrs={'class': 'oldPrice'}).find('del').get_text(strip=True)
    single ={'title':title,'link':link,'picture':picture,'first_price':first_price,'old_price':old_price}
    result.append(single)
    with open('book.json','w') as f:
      json.dump(result ,f,indent=4)
else:
   print(r.status_code)

网页抓取如何将不可用的数据保存为空值

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-11-01 08:10:06

网页抓取如何将不可用的数据保存为空值

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-11-01 08:10:06

解决方案1
0 已采纳 2020-11-01 08:10:06