将 python 中的多个嵌套 JSON 转换为 CSV

Question

我有一个 JSON 并且我想将其转换为 CSV 但问题是 json 是多个嵌套的并且内部字段并不总是具有相同数量的对象。

例如

套件 1 有 5 个产品，套件 2 有 3 个产品（以及两种情况下的产品数量）

套件 1：

"kit":{
               "products":[
                  {
                     "product":"PP001",
                     "quantity":1
                  },
                  {
                     "product":"PS001",
                     "quantity":1
                  },
                  {
                     "product":"PL001",
                     "quantity":1
                  },
                  {
                     "product":"FIN1187",
                     "quantity":3
                  },
                  {
                     "product":"FSS001",
                     "quantity":4
                  }
               ],
               "kit_client":"Lumax Mannoh Allied Technologies Limited",
               "kit_name":"KIT1187",
               "kit_info":"Gear Lever TACO_FLC",
               "components_per_kit":66
            },

套件 2：

"kit":{
               "products":[
                  {
                     "product":"CRT6423",
                     "quantity":1
                  },
                  {
                     "product":"CIN1198A",
                     "quantity":2
                  },
                  {
                     "product":"CSS001",
                     "quantity":3
                  }
               ],
               "kit_client":"Lumax Mannoh Allied Technologies Limited",
               "kit_name":"KIT1198B",
               "kit_info":"Floor Sealing Assy_Crate",
               "components_per_kit":72
            },
            "flow":"LMXMNH_Manesar_Nashik_Floor Sealing Assy W501",
            "asked_quantity":3,
            "alloted_quantity":3

我尝试json_normalize但它使外部字典变平。 我希望 output 看起来像这样：

transaction_no  dispatch_date send_from_warehouse  sales_order  flow_name  kit_name  asked_quantity  alloted_quantity  product1  product1 quantity  product2  product2 quantity...( to the maximum product in all JSON)

完整的 JSON：

https://codebeautify.org/online-json-editor/cbd770f5

Answer 1

json_normalize是用于简单事情的好工具。 当您有一个深度嵌套的 json 时，最好使用递归自定义 function手动处理它。

在这里，您要保留所有具有即时数据的键，但应编号的产品除外。

一种可能的方法是构建一个集合来保留字段名称，并递归地为数据构建一个字典列表。

data = json.loads(js)

def find_keys(data, keys=None, lst= None, cur = None):
    if keys is None:
        keys = set()      # will contain the field names
        lst = []          # list of dict for the data
        cur = {}          # current data row
    if isinstance(data, list):
        for sub in data:
            cur = cur.copy()      # create a new row for each item in of a list
            lst.append(cur)
            find_keys(sub, keys, lst, cur)
    elif isinstance(data, dict):
        for k,v in data.items():
            if k == 'products':   # special processing for products
                for i,p in enumerate(v, 1):
                    for (k1, v1) in p.items():
                        keys.add(k1 + str(i))
                        cur[k1 + str(i)] = v1
            elif isinstance(v, (list, dict)):
                cur = cur.copy()   # a new row for each nested json
                lst.append(cur)
                find_keys(v, keys, lst, cur)
            else:
                keys.add(k) # a plain data (number or string): feed the row
                cur[k] = v
    return lst, keys

lst, keys = find_keys(data)

# sort the products to come after the other keys
fieldnames = sorted(keys, key=lambda k: 1 * 2*int(k[8:])
                    if k.startswith('quantity')
                    else 2*int(k[7:]) if k.startswith('product') else 0)

# and use the csv module here
with open('data.csv', newline='') as fd:
    wr = csv.DictWriter(fd, fieldnames)
    _ = wr.writeheader()
    wr.writerows(lst)
    print(fd.getvalue())

# or build a dataframe
df = pd.DataFrame(lst, columns=fieldnames)

如果您只想要列的子集，则可以使用reindex ：

columns = ['asked_quantity', 'freight_charges', 'driver_name', 'sales_order', 
           'id', 'transport_by', 'alloted_quantity', 'is_delivered', 'kit_info', 
           'dispatch_date', 'expected_delivery', 'vehicle_number', 'vehicle_type',
           'remarks', 'kit_name', 'lr_number', 'owner', 'transaction_no', 'kit_client', 
           'driver_number', 'send_from_warehouse', 'flow', 'model', 'components_per_kit',
           'product1', 'quantity1', 'quantity2', 'product2',
           'quantity3', 'product3', 'quantity4', 'product4', 'quantity5', 'product5',
           'product6', 'quantity6'
          ]

df = df.reindex(columns=columns)

Answer 2

从这样的 json 中提取数据的“经典”方法如下：

d = json.load(open("my_file.json"))
df = pd.json_normalize(d, record_path=["flows", "kit", "products"], 
                  meta=["transaction_no", "dispatch_date", "send_from_warehouse", "sales_order", 
                        ["flows", "flow"], 
                        ["flows", "kit", "kit_name"], 
                        ["flows", "asked_quantity"], 
                        ["flows", "alloted_quantity"]
                       ])

output如下：

    product quantity    transaction_no  dispatch_date   send_from_warehouse sales_order flows.flow  flows.kit.kit_name  flows.asked_quantity    flows.alloted_quantity
0   PP001   1   2324    2020-08-11T04:40:34.876000Z Yantraksh Logistics Private limited_GGNPC1  105 LMXMNH_Manesar_Nashik_Transmission Gear Leaver...   KIT1162A    3   3
1   PS001   1   2324    2020-08-11T04:40:34.876000Z Yantraksh Logistics Private limited_GGNPC1  105 LMXMNH_Manesar_Nashik_Transmission Gear Leaver...   KIT1162A    3   3
2   PL001   1   2324    2020-08-11T04:40:34.876000Z Yantraksh Logistics Private limited_GGNPC1  105 LMXMNH_Manesar_Nashik_Transmission Gear Leaver...   KIT1162A    3   3

这是否回答你的问题？ 要为第一个产品创建一个列，为第二个产品创建一个列等，您可以进行一些旋转。

Answer 3

您的 JSON 有一个非常简单的方法

json_normalize()获取第一遍记录（每件套件）
explode()产品
将其转回 JSON to_dict(orient="records")
json_normalize()再次扩展产品中的字典

kit = [{'kit': {'products': [{'product': 'PP001', 'quantity': 1},
   {'product': 'PS001', 'quantity': 1},
   {'product': 'PL001', 'quantity': 1},
   {'product': 'FIN1187', 'quantity': 3},
   {'product': 'FSS001', 'quantity': 4}],
  'kit_client': 'Lumax Mannoh Allied Technologies Limited',
  'kit_name': 'KIT1187',
  'kit_info': 'Gear Lever TACO_FLC',
  'components_per_kit': 66}},
{'kit': {'products': [{'product': 'CRT6423', 'quantity': 1},
   {'product': 'CIN1198A', 'quantity': 2},
   {'product': 'CSS001', 'quantity': 3}],
  'kit_client': 'Lumax Mannoh Allied Technologies Limited',
  'kit_name': 'KIT1198B',
  'kit_info': 'Floor Sealing Assy_Crate',
  'components_per_kit': 72},
 'flow': 'LMXMNH_Manesar_Nashik_Floor Sealing Assy W501',
 'asked_quantity': 3,
 'alloted_quantity': 3}]

df = pd.json_normalize(pd.json_normalize(kit)\
            .explode("kit.products").to_dict(orient="records"))

print(df.loc[[0,1,6,7]].to_string(index=False))

样品 output

                           kit.kit_client kit.kit_name              kit.kit_info  kit.components_per_kit                                           flow  asked_quantity  alloted_quantity kit.products.product  kit.products.quantity
 Lumax Mannoh Allied Technologies Limited      KIT1187       Gear Lever TACO_FLC                      66                                            NaN             NaN               NaN                PP001                      1
 Lumax Mannoh Allied Technologies Limited      KIT1187       Gear Lever TACO_FLC                      66                                            NaN             NaN               NaN                PS001                      1
 Lumax Mannoh Allied Technologies Limited     KIT1198B  Floor Sealing Assy_Crate                      72  LMXMNH_Manesar_Nashik_Floor Sealing Assy W501             3.0               3.0             CIN1198A                      2
 Lumax Mannoh Allied Technologies Limited     KIT1198B  Floor Sealing Assy_Crate                      72  LMXMNH_Manesar_Nashik_Floor Sealing Assy W501             3.0               3.0               CSS001                      3

补充更新

外部链接上的 JSON 深度为三层。 完全相同的模式，你有 dataframe。

(pd.json_normalize(pd.json_normalize(pd.json_normalize(kit)
                   .explode("flows")
                   .to_dict(orient="records"))
 .explode("flows.kit.products")
 .to_dict(orient="records"))
)

将 python 中的多个嵌套 JSON 转换为 CSV

问题描述

3 个解决方案

解决方案1
1 已采纳 2020-08-15 08:57:39

解决方案2
0 2020-08-15 07:34:04

解决方案3
0 2020-08-15 08:29:56

补充更新

将 python 中的多个嵌套 JSON 转换为 CSV

问题描述

3 个解决方案

解决方案1 1 已采纳 2020-08-15 08:57:39

解决方案2 0 2020-08-15 07:34:04

解决方案3 0 2020-08-15 08:29:56

补充更新

解决方案1
1 已采纳 2020-08-15 08:57:39

解决方案2
0 2020-08-15 07:34:04

解决方案3
0 2020-08-15 08:29:56