简体   繁体   English

Python抓取JSON比较文件

[英]python scraping json compare files

Here is my python code, it searchs the webpage two times for getting product details and save data in .json files. 这是我的python代码,它两次在网页上搜索以获取产品详细信息并将数据保存在.json文件中。 It should check if the key in the new file changes and print what changed, but i'm getting the following error. 它应该检查新文件中的密钥是否更改并打印更改的内容,但是出现以下错误。

Error : 错误:

 Traceback (most recent call last):
    File "x.py", line 84, in <module>
    compare()
    File "x.py", line 76, in compare
    for key in b.keys():
    AttributeError: 'NoneType' object has no attribute 'keys'

Code: 码:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import cfscrape
import requests
from bs4 import BeautifulSoup as bs
import re
from pprint import pprint
import json

s = requests.Session()
s = cfscrape.create_scraper()

products = []
products1 = []

def x():
    r = s.get("https://www.oneblockdown.it/it/calzature-sneakers", headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})
    soup = bs(r.content, "html.parser")

    js = [x.text for x in soup.find_all('script', {'type': 'text/javascript'}) if "var preloadedItems =" in x.text][0]
    js = js.replace('var preloadedItems = ', '')
    js = js[:js.find("}];")]+"}]".strip()
    data = json.loads(js)
    for product in data:
        product_id = product["id"]
        product_title = product["title"]
        product_link = product["permalink"]
        product_price = product["displayPrice"]
        product_available = product["isAvailable"]
        product_size = product["attributes"]
        products.append({
            "product_id": product_id,
            "product_title": product_title,
            "product_link": product_link,
            "product_price": product_price,
            "product_available": product_available,
            "product_size": product_size
        })

    with open('data.json', 'w') as f:
        json.dump(products, f, indent = 4)
        f.close()

def y():
    r1 = s.get("https://www.oneblockdown.it/it/calzature-sneakers", 
    headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})
    soup = bs(r1.content, "html.parser")

    js = [x.text for x in soup.find_all('script', {'type': 'text/javascript'}) if "var preloadedItems =" in x.text][0]
    js = js.replace('var preloadedItems = ', '')
    js = js[:js.find("}];")]+"}]".strip()
    data1 = json.loads(js)
    for product in data1:
        product_id = product["id"]
        product_title = product["title"]
        product_link = product["permalink"]
        product_price = product["displayPrice"]
        product_available = product["isAvailable"]
        product_size = product["attributes"]
        products1.append({
            "product_id": product_id,
            "product_title": product_title,
            "product_link": product_link,
            "product_price": product_price,
            "product_available": product_available,
            "product_size": product_size
        })

    with open('data1.json', 'w') as f:
        json.dump(products, f, indent = 4)
        f.close()


def compare():
    while True:
        a = x()
        b = y()
        for key in b.keys():
            value = b[key]
            if key not in a:
                print(key, value)
            else:
                if a[key] != value:
                    print("for key {} values are different".format(key))

compare()

I've choose this method but i don't know if there is a better one for this purpose. 我选择了这种方法,但我不知道是否有更好的方法可用于此目的。

You are not returning anything from x() and y() methods. 您没有从x()y()方法返回任何内容。 Hence, a and b are of the type None . 因此, ab的类型为None

Most likely you would want to remove the products list from x() and y() , so add a return statement in the method. 您很可能希望从x()y()删除products列表,因此在该方法中添加return语句。

Like: 喜欢:

return products

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM