简体   繁体   English

无法从 python 中的 json 文件中删除某些值

[英]Could not scrap some values from json file in python

I would like scrap the data from json file, however I could not scrap the availability ("available" in json file) of the json value.我想废弃 json 文件中的数据,但是我无法废弃 json 值的可用性(json 文件中的“可用”)。 The other values are scrapped sucessfully.其他值被成功废弃。

It shown blank on the column.它在列上显示为空白。

varavailability= "" if i >= len(variants) else variants[i].get('available', '')
import asyncio
import os
import random
import time
import openpyxl
import aiohttp
from urllib import request

# path="C:/Users/pengoul/Downloads/dl" 
path = os.getcwd()
print(f"CWD is {path}")
path = os.path.join(path, "download")
if not os.path.exists(path):
        os.makedirs(path)

# picpath= os.makedirs('picture')
async def request():
    async with aiohttp.ClientSession() as session:
        async with session.get(url='https://hiutdenim.co.uk/products.json?limit=500') as resp:
            html = await resp.json()
            k = list()
            f = openpyxl.Workbook()
            sheet = f.active
            sheet.append(['Name', 'Barcode', 'Product Category', 'Image', 'Internal Reference', 'Sales Price','Product Tags'])

            products = []

            print("Saving to excel ...")
            for i in html['products']:
                title = i.get('title')
                id1 = i.get('id')
                product_type = i.get('product_type')
                images = [img.get('src', '') for img in i.get('images', [])]
                products.append((title, id1, product_type, images))
                variants = [var for var in i.get('variants')]
                for i in range(max(len(images), len(variants))):
                    imgsrc = "" if i >= len(images) else images[i]
                    varsku = "" if i >= len(variants) else variants[i].get('sku', '')
                    varprice = "" if i >= len(variants) else variants[i].get('price', '')
                    varavailability= "" if i >= len(variants) else variants[i].get('available', '')
                    sheet.append([title, "'" + str(id1), product_type, imgsrc, varsku, varprice, varavailability])
                f.save(f"result230102.xlsx")

 print("Downloading images ...")
            for product in products:
                title, id1, product_type, images = product
                for seq, imgurl in enumerate(images):
                    print(f"Downloading img for {id1} ({seq + 1}/{len(images)})")
                    request.urlretrieve(imgurl, os.path.join(path, f"{id1}-{seq + 1}.jpg"))

async def download(url):
    image = url[0]
    file_name = f'{url[1]}.jpg'
    print(f'picpath/{file_name}')
    async with aiohttp.ClientSession() as session:
        time.sleep(random.random())
        async with session.get(image) as resp:
            with open(path+ file_name, mode='wb') as f:
                f.write(await resp.content.read())

#     print(f'picpath/{file_name}')


async def main():
    if not os.path.exists(path):
        os.mkdir(path)
    tasks = []
    await request()
    # for url in urls:
    #     tasks.append(asyncio.create_task(download(url)))
    # await asyncio.wait(tasks)


if __name__ == '__main__':
    print(os.getpid())
    t1 = time.time()
    urls = []
    loop = asyncio.get_event_loop()  
    loop.run_until_complete(main())  
    t2 = time.time()
    print('total:', t2 - t1)

在此处输入图像描述

It shown blank on this column.它在此列上显示为空白。

I would like to scrap the values of "available" from json.我想从 json 中删除“可用”的值。

在此处输入图像描述

I ran your code in my debugger, putting a breakpoint at the line in question.我在我的调试器中运行了你的代码,在有问题的行上放置了一个断点。 This breakpoint is hit many times during execution.这个断点在执行过程中多次命中。 In some cases, it produces a True value for varavailability as you're expecting.在某些情况下,它会像您期望的那样为varavailability生成True值。

At some point, this line ends up executing when the value of i is 1 and the length of variants is also 1 .在某些时候,当i的值为1并且variants的长度也为1时,该行结束执行。 In this case, per the if condition if i >= len(variants) , the variable varavailability is set to "" .在这种情况下,根据if条件if i >= len(variants) ,变量varavailability设置为"" i is allowed to have a value of 1 because the length of images in this case is 5 . i允许值为1 ,因为在这种情况下images的长度为5 In this case, your loop for i in range(max(len(images), len(variants))): will iterate over i == 0 to i == 4 .在这种情况下,您的循环for i in range(max(len(images), len(variants))):将遍历i == 0i == 4 For each i value greater than 0 , varavailability will be set to "" .对于每个大于0i值, varavailability将被设置为"" I can't be sure if this is the case you're wondering about, but it makes good sense that it is.我不确定您是否想知道这种情况,但这是有道理的。

UPDATE:更新:

As to how to fix this, the question centers on how the contents of variants and images relate to each other and on what you are doing in your loop:至于如何解决这个问题,问题集中在variantsimages的内容如何相互关联以及您在循环中做什么:

for i in range(max(len(images), len(variants))):
    imgsrc = "" if i >= len(images) else images[i]
    varsku = "" if i >= len(variants) else variants[i].get('sku', '')
    varprice = "" if i >= len(variants) else variants[i].get('price', '')
    varavailability= "" if i >= len(variants) else variants[i].get('available', '')
    sheet.append([title, "'" + str(id1), product_type, imgsrc, varsku, varprice, varavailability])

It seems that the code is iterating over a list of products, and each product has two lists associated with it, a list of images, and a list of variants.代码似乎在迭代产品列表,每个产品都有两个与之关联的列表,一个图像列表和一个变体列表。 My guess is that the contents of these two lists are independent...that each value in images does not correspond to a particular entry in variants .我的猜测是这两个列表的内容是独立的...... images中的每个值都不对应于variants中的特定条目。

If what you want is a table of product variants, one possible solution is to associate all of the images for a particular product with each of the variations of that product, and then just iterate over each of the variants.如果您想要的是产品变体表,一个可能的解决方案是将特定产品的所有图像与该产品的每个变体相关联,然后只遍历每个变体。 That could be something like this:那可能是这样的:

imgsrc = " ".join(images)
for variant in variants:
    varsku = variants.get('sku', '')
    varprice = variants.get('price', '')
    varavailability = variants.get('available', '')
    sheet.append([title, "'" + str(id1), product_type, imgsrc, varsku, varprice, varavailability])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM