简体   繁体   中英

How do I extract a specific value from a nested dictionary as I iterate through a list of dictionaries?

I'm working on a CNN and need to grab some images from URI's in a json file but keep them associated with the corresponding ids. I have a json file that looks something like this. I want to iterate through each product and extract 'id' and from 'image_uris' the "large" uri.

[{
  "product_type": "widget",
  "id": "1744556-ghh56h-4633",
  "manufacture_id": "AAB4567",
  "store_ids": [416835, 456145],
  "name": "Best Widget",
  "origin": "US",
  "manufactured": "2018-08-26",
  "uri": "https://bobswidgets.com/best_widget",
  "image_uris": {
    "small": "https://bobswidgets.com/small/best_widget_sm.jpg",
    "normal": "https://bobswidgets.com/medium/best_widget_md.jpg",
    "large": "https://bobswidgets.com/large/best_widget_lg.jpg",
  },
  "manufacture_cost": "12.50",
},
{
  "product_type": "widget",
  "id": "0956786-dje596-3904",
  "manufacture_id": "BCD13D",
  "store_ids": [014329, 40123],
  "name": "Best Widget2",
  "origin": "US",
  "manufactured": "2018-10-03",
  "uri": "https://bobswidgets.com/best_widget_2",
  "image_uris": {
    "small": "https://bobswidgets.com/small/best_widget2_sm.jpg",
    "normal": "https://bobswidgets.com/medium/best_widget2_md.jpg",
    "large": "https://bobswidgets.com/large/best_widget2_lg.jpg",
  },
  "manufacture_cost": "13.33",
}]

I then want to put them into their own dictionary like this. At least this is what I think I want to do unless there is a better idea:

[{"1744556-ghh56h-4633" : "https://bobswidgets.com/large/best_widget_lg.jpg"}, {"0956786-dje596-3904", "https://bobswidgets.com/large/best_widget2_lg.jpg"}]

My endgame would be to grab the images at those URI's and save them with the 'id' as the image name like this:

1744556-ghh56h-4633_lg.jpg
0956786-dje596-3904_lg.jpg

Eventually these images will be used for CNN as I mentioned earlier. When the image is recognized a lookup can be performed and return all the other values from the json file.

So far here is the code I've been using to extract the data I want. It grabs the 'id' fine but it grabs all of the image uris. I only want the 'large' uri.

import ujson as json

with open('product.json', 'r') as f:
    prod_txt = f.read()

prod_dict = json.loads(prod_txt)

id = []
uris = []

    for dictionary in prod_dict:
        id.append(list(dictionary.values())[1])
        if isinstance(dictionary, dict):
            uris.append(list(dictionary.values())[8])

I've made various attempts to single out the 'large' uri without success Not really sure how to do it with a nested dictionary without throwing an error. I'm sure it is something simple but I'm still an amateur coder.

Using list comprehensions this can be done quite simply

In [106]: img_ids = [{d['id']: d['image_uris']['large']} for d in prod_dict]

In [107]: img_ids
Out[107]:
[{'1744556-ghh56h-4633': 'https://bobswidgets.com/large/best_widget_lg.jpg'},
 {'0956786-dje596-3904': 'https://bobswidgets.com/large/best_widget2_lg.jpg'}]

Note that this assumes that in each dict within the list that there is always an id and a value for large in image_uris . If these aren't present you will get a KeyError

If this is the case you will have to utilise dict.get like so

# Adding new entry without 'image_uris' dict
In [110]: prod_dict.append({'id': 'new_id'})

In [111]: img_ids = [{d['id']: d.get('image_uris', {}).get('large', 'N/A')} for d in prod_dict]

In [112]: img_ids
Out[112]:
[{'1744556-ghh56h-4633': 'https://bobswidgets.com/large/best_widget_lg.jpg'},
 {'0956786-dje596-3904': 'https://bobswidgets.com/large/best_widget2_lg.jpg'},
 {'new_id': 'N/A'}]

Your edits to the product.json file still don't make it valid JSON, so I used the following instead, which is:

[
  {
    "product_type": "widget",
    "id": "1744556-ghh56h-4633",
    "manufacture_id": "AAB4567",
    "store_ids": [
      416835,
      456145
    ],
    "name": "Best Widget",
    "origin": "US",
    "manufactured": "2018-08-26",
    "uri": "https://bobswidgets.com/best_widget",
    "image_uris": {
      "small": "https://bobswidgets.com/small/best_widget_sm.jpg",
      "normal": "https://bobswidgets.com/medium/best_widget_md.jpg",
      "large": "https://bobswidgets.com/large/best_widget_lg.jpg"
    },
    "manufacture_cost": "12.50"
  },
  {
    "product_type": "widget",
    "id": "0956786-dje596-3904",
    "manufacture_id": "BCD13D",
    "store_ids": [
      "014329",
      "40123"
    ],
    "name": "Best Widget2",
    "origin": "US",
    "manufactured": "2018-10-03",
    "uri": "https://bobswidgets.com/best_widget_2",
    "image_uris": {
      "small": "https://bobswidgets.com/small/best_widget2_sm.jpg",
      "normal": "https://bobswidgets.com/medium/best_widget2_md.jpg",
      "large": "https://bobswidgets.com/large/best_widget2_lg.jpg"
    },
    "manufacture_cost": "13.33"
  }
]

So, ignoring that and assuming you're able to do it somehow yourself, you could create the dictionary you want using something called a dictionary display which is very similar to a list comprehension .

import json
from pprint import pprint

filename = 'product.json'

with open(filename, 'r') as f:
    prod_txt = f.read()
    prod_list = json.loads(prod_txt)

result_dict = {product['id']: product['image_uris']['large']
                for product in prod_list}

pprint(result_dict)

Output:

{'0956786-dje596-3904': 'https://bobswidgets.com/large/best_widget2_lg.jpg',
 '1744556-ghh56h-4633': 'https://bobswidgets.com/large/best_widget_lg.jpg'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM