JSON 到 Python Pandas 數據框

Question

我是 Python 新手，一直專注於學習 Pandas 和 xlxswriter 以幫助自動化一些工作流程。 我附上了一個我可以訪問的 JSON 文件片段，但無法轉換為 Pandas 數據幀。

如果我使用pd.read_json(filename) ：它會通過將它們的內容集中在一個單元格中來pd.read_json(filename)變體產品和產品pd.read_json(filename) 。

問題：我將如何獲取這個 JSON 文件並使其看起來像底部的 Pandas 數據幀輸出：

[
  {
    "ID": "12345",
    "productName": "Product A ",
    "minPrice": "$89.00",
    "maxPrice": "$89.00",
    "variationProducts": [
      {
        "variantColor": "JJ0BVE7",
        "variantSize": "080",
        "sellingPrice": "$89.00",
        "inventory": 3,
      },
      {
        "variantColor": "JJ0BVE7",
        "variantSize": "085",
        "sellingPrice": "$89.00",
        "inventory": 6,
      }
    ],
    "productAttributes": [
        {
        "ID": "countryOfOrigin",
        "value": "Imported"
      },
      {
        "ID": "csProductCode",
        "value": "1100"
      }
    ]
  },
  {
    "ID": "23456",
    "productName": "Product B",
    "minPrice": "$29.99",
    "maxPrice": "$69.00",
    "variationProducts": [
      {
        "variantColor": "JJ169Q0",
        "variantSize": "050",
        "sellingPrice": "$69.00",
        "inventory": 55,
      },
      {
        "variantColor": "JJ123Q0",
        "variantSize": "055",
        "sellingPrice": "$69.00",
        "inventory": 5,
      }
    ],
   "productAttributes": [
        {
        "ID": "countryOfOrigin",
        "value": "Imported"
      },
      {
        "ID": "csProductCode",
        "value": "1101"
      }
    ]
  }
]

我在 excel 中制作了這個示例輸出，variationProducts 在 variantColor 級別匯總 - 因此對於產品 A，庫存是兩個變體的總和，盡管它們具有不同的 variantSizes：

     ID      productName maxPrice minPrice countryOfOrigin csProductCode variantColor inventory
    12345   Product A   $89     $89         Imported        1100    JJ0BVE7    9
    23456   Product B   $69     $30         Imported        1101    JJ169Q0    55
    23456   Product B   $69     $30         Imported        1101    JJ123Q0    5

Answer 1

您可以使用json_normalize ：

In [11]: pd.io.json.json_normalize(d, "variationProducts", ["ID", "maxPrice", "minPrice", "productName"], record_prefix=".")
Out[11]:
   .inventory .sellingPrice .variantColor .variantSize     ID maxPrice minPrice productName
0           3        $89.00       JJ0BVE7          080  12345   $89.00   $89.00  Product A
1           6        $89.00       JJ0BVE7          085  12345   $89.00   $89.00  Product A
2          55        $69.00       JJ169Q0          050  23456   $69.00   $29.99   Product B
3           5        $69.00       JJ123Q0          055  23456   $69.00   $29.99   Product B

In [12]: pd.io.json.json_normalize(d, "productAttributes", ["ID", "maxPrice", "minPrice", "productName"], record_prefix=".")
Out[12]:
               .ID    .value     ID maxPrice minPrice productName
0  countryOfOrigin  Imported  12345   $89.00   $89.00  Product A
1    csProductCode      1100  12345   $89.00   $89.00  Product A
2  countryOfOrigin  Imported  23456   $69.00   $29.99   Product B
3    csProductCode      1101  23456   $69.00   $29.99   Product B

然后，您可以將這兩個加入/合並在一起...

Answer 2

我認為您必須對數據進行少量解析才能將其轉換為正確的格式，以便 read_json 正常工作。

首先使用 json.load(file_name) 將 json 數據放入一個將成為列表的 python 對象中。

現在您需要轉換這個列表，使每個對象都是字典，每個字典都有鍵作為列名，值作為您想要在該列中的值。

一個你已經准備好了列表然后你可以使用pandas.DataFrame(list)

Answer 3

l = [
  {
    "ID": "12345",
    "productName": "Product A ",
    "minPrice": "$89.00",
    "maxPrice": "$89.00",
    "variationProducts": [
      {
        "variantColor": "JJ0BVE7",
        "variantSize": "080",
        "sellingPrice": "$89.00",
        "inventory": 3,
      },
      {
        "variantColor": "JJ0BVE7",
        "variantSize": "085",
        "sellingPrice": "$89.00",
        "inventory": 6,
      }
    ],
    "productAttributes": [
        {
        "ID": "countryOfOrigin",
        "value": "Imported"
      },
      {
        "ID": "csProductCode",
        "value": "1100"
      }
    ]
  },
  {
    "ID": "23456",
    "productName": "Product B",
    "minPrice": "$29.99",
    "maxPrice": "$69.00",
    "variationProducts": [
      {
        "variantColor": "JJ169Q0",
        "variantSize": "050",
        "sellingPrice": "$69.00",
        "inventory": 55,
      },
      {
        "variantColor": "JJ123Q0",
        "variantSize": "055",
        "sellingPrice": "$69.00",
        "inventory": 5,
      }
    ],
   "productAttributes": [
        {
        "ID": "countryOfOrigin",
        "value": "Imported"
      },
      {
        "ID": "csProductCode",
        "value": "1101"
      }
    ]
  }
]


import pandas as pd
from itertools import *

final_list = []
for val in l:
    d = {}
    d.update({key:val[key] for key in val.keys() if key not in ['variationProducts','productAttributes']})
    for prods,attrs in izip_longest(val['variationProducts'],val['productAttributes']):
        if prods:
            d.update(prods)
        if attrs:
            d.update({attrs['ID']:attrs['value']})
        final_list.append(d.copy())

pd.DataFrame(final_list)

JSON 到 Python Pandas 數據框

問題描述

3 個解決方案

解決方案1
1 已采納 2017-10-27 16:14:42

解決方案2
0 2017-10-27 16:10:29

解決方案3
0 2017-10-27 16:31:02

JSON 到 Python Pandas 數據框

問題描述

3 個解決方案

解決方案1 1 已采納 2017-10-27 16:14:42

解決方案2 0 2017-10-27 16:10:29

解決方案3 0 2017-10-27 16:31:02

解決方案1
1 已采納 2017-10-27 16:14:42

解決方案2
0 2017-10-27 16:10:29

解決方案3
0 2017-10-27 16:31:02