[英]web scraping how to save unavalible data as a null
hi i am trying to get data with web scraping but my code gets untill "old_price" = null how can i skip this data if it is empty or how can i read it and save unavailable as a null this is my python code嗨,我正在尝试通过网络抓取获取数据,但我的代码直到“old_price”= null如果它为空,我如何跳过此数据,或者我如何读取它并将不可用的数据保存为 null,这是我的 Python 代码
import requests
import json
from bs4 import BeautifulSoup
header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'}
base_url = "https://www.n11.com/super-firsatlar"
r = requests.get(base_url,headers=header)
if r.status_code == 200:
soup = BeautifulSoup(r.text, 'html.parser')
books = soup.find_all('li',attrs={"class":"column"})
result=[]
for book in books:
title=book.find('h3').text.strip()
link=base_url +book.find('a')['href']
picture = base_url + book.find('img')['src']
first_price = book.find('a', attrs={'class': 'newPrice'}).find('ins').text[:10].strip().strip()+" TL"
old_price = book.find('a', attrs={'class': 'oldPrice'}).find('del').text.strip()
single ={'title':title,'link':link,'picture':picture,'first_price':first_price,'old_price':old_price}
result.append(single)
with open('book.json','w', encoding='utf-8') as f:
json.dump(result ,f,indent=4,ensure_ascii=False)
else:
print(r.status_code)
<div class="proDetail"> <a href="https://test.com"class="oldPrice" title="Premium"> <del>69,00 TL</del></a> <a href="https://test.com"class="newPrice" title="Premium"> <ins>14,90</ins> </a> <a href="https://test.com"class="newPrice" title="Premium"> <ins>19,90</ins> </a> </div> <a href="https://test.com"class="oldPrice" title="Premium"> <del>79,00 TL</del></a> <a href="https://test.com"class="newPrice" title="Premium"> <ins>34,90</ins> </a>
and this is my error这是我的错误
File "C:\Users\Red\Desktop\webcrawler-tutorial-master\hepsiburada\main.py", line 22, in <module>
old_price = book.find('a', attrs={'class': 'oldPrice'}).find('del').text.strip()
AttributeError: 'NoneType' object has no attribute 'find'
The good practice in scraping for the name, Price, links we need to have a good error handling for each of the fields we're scraping.抓取名称、价格、链接的良好实践,我们需要对我们抓取的每个字段进行良好的错误处理。 Something like below像下面这样
import requests
import json
from bs4 import BeautifulSoup
header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'}
base_url = "https://www.n11.com/super-firsatlar"
r = requests.get(base_url,headers=header)
if r.status_code == 200:
soup = BeautifulSoup(r.text, 'html.parser')
books = soup.find_all('li',attrs={"class":"column"})
result=[]
for book in books:
title=book.find('h3').text.strip()
link=base_url +book.find('a')['href']
picture = base_url + book.find('img')['src']
first_price = book.find('a', attrs={'class': 'newPrice'}).find('ins').text[:10].strip().strip()+" TL"
if book.find('a', attrs={'class': 'oldPrice'}):
old_price = book.find('a', attrs={'class': 'oldPrice'}).find('del').get_text(strip=True)
single ={'title':title,'link':link,'picture':picture,'first_price':first_price,'old_price':old_price}
result.append(single)
with open('book.json','w') as f:
json.dump(result ,f,indent=4)
else:
print(r.status_code)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.