簡體   English   中英

JSON 到 Python Pandas 數據框

[英]JSON to Python Pandas dataframe

我是 Python 新手,一直專注於學習 Pandas 和 xlxswriter 以幫助自動化一些工作流程。 我附上了一個我可以訪問的 JSON 文件片段,但無法轉換為 Pandas 數據幀。

如果我使用pd.read_json(filename) :它會通過將它們的內容集中在一個單元格中來pd.read_json(filename)變體產品和產品pd.read_json(filename)

問題:我將如何獲取這個 JSON 文件並使其看起來像底部的 Pandas 數據幀輸出:

[
  {
    "ID": "12345",
    "productName": "Product A ",
    "minPrice": "$89.00",
    "maxPrice": "$89.00",
    "variationProducts": [
      {
        "variantColor": "JJ0BVE7",
        "variantSize": "080",
        "sellingPrice": "$89.00",
        "inventory": 3,
      },
      {
        "variantColor": "JJ0BVE7",
        "variantSize": "085",
        "sellingPrice": "$89.00",
        "inventory": 6,
      }
    ],
    "productAttributes": [
        {
        "ID": "countryOfOrigin",
        "value": "Imported"
      },
      {
        "ID": "csProductCode",
        "value": "1100"
      }
    ]
  },
  {
    "ID": "23456",
    "productName": "Product B",
    "minPrice": "$29.99",
    "maxPrice": "$69.00",
    "variationProducts": [
      {
        "variantColor": "JJ169Q0",
        "variantSize": "050",
        "sellingPrice": "$69.00",
        "inventory": 55,
      },
      {
        "variantColor": "JJ123Q0",
        "variantSize": "055",
        "sellingPrice": "$69.00",
        "inventory": 5,
      }
    ],
   "productAttributes": [
        {
        "ID": "countryOfOrigin",
        "value": "Imported"
      },
      {
        "ID": "csProductCode",
        "value": "1101"
      }
    ]
  }
]

我在 excel 中制作了這個示例輸出,variationProducts 在 variantColor 級別匯總 - 因此對於產品 A,庫存是兩個變體的總和,盡管它們具有不同的 variantSizes:

     ID      productName maxPrice minPrice countryOfOrigin csProductCode variantColor inventory
    12345   Product A   $89     $89         Imported        1100    JJ0BVE7    9
    23456   Product B   $69     $30         Imported        1101    JJ169Q0    55
    23456   Product B   $69     $30         Imported        1101    JJ123Q0    5

您可以使用json_normalize

In [11]: pd.io.json.json_normalize(d, "variationProducts", ["ID", "maxPrice", "minPrice", "productName"], record_prefix=".")
Out[11]:
   .inventory .sellingPrice .variantColor .variantSize     ID maxPrice minPrice productName
0           3        $89.00       JJ0BVE7          080  12345   $89.00   $89.00  Product A
1           6        $89.00       JJ0BVE7          085  12345   $89.00   $89.00  Product A
2          55        $69.00       JJ169Q0          050  23456   $69.00   $29.99   Product B
3           5        $69.00       JJ123Q0          055  23456   $69.00   $29.99   Product B

In [12]: pd.io.json.json_normalize(d, "productAttributes", ["ID", "maxPrice", "minPrice", "productName"], record_prefix=".")
Out[12]:
               .ID    .value     ID maxPrice minPrice productName
0  countryOfOrigin  Imported  12345   $89.00   $89.00  Product A
1    csProductCode      1100  12345   $89.00   $89.00  Product A
2  countryOfOrigin  Imported  23456   $69.00   $29.99   Product B
3    csProductCode      1101  23456   $69.00   $29.99   Product B

然后,您可以將這兩個加入/合並在一起...

我認為您必須對數據進行少量解析才能將其轉換為正確的格式,以便 read_json 正常工作。

首先使用 json.load(file_name) 將 json 數據放入一個將成為列表的 python 對象中。

現在您需要轉換這個列表,使每個對象都是字典,每個字典都有鍵作為列名,值作為您想要在該列中的值。

一個你已經准備好了列表然后你可以使用pandas.DataFrame(list)

l = [
  {
    "ID": "12345",
    "productName": "Product A ",
    "minPrice": "$89.00",
    "maxPrice": "$89.00",
    "variationProducts": [
      {
        "variantColor": "JJ0BVE7",
        "variantSize": "080",
        "sellingPrice": "$89.00",
        "inventory": 3,
      },
      {
        "variantColor": "JJ0BVE7",
        "variantSize": "085",
        "sellingPrice": "$89.00",
        "inventory": 6,
      }
    ],
    "productAttributes": [
        {
        "ID": "countryOfOrigin",
        "value": "Imported"
      },
      {
        "ID": "csProductCode",
        "value": "1100"
      }
    ]
  },
  {
    "ID": "23456",
    "productName": "Product B",
    "minPrice": "$29.99",
    "maxPrice": "$69.00",
    "variationProducts": [
      {
        "variantColor": "JJ169Q0",
        "variantSize": "050",
        "sellingPrice": "$69.00",
        "inventory": 55,
      },
      {
        "variantColor": "JJ123Q0",
        "variantSize": "055",
        "sellingPrice": "$69.00",
        "inventory": 5,
      }
    ],
   "productAttributes": [
        {
        "ID": "countryOfOrigin",
        "value": "Imported"
      },
      {
        "ID": "csProductCode",
        "value": "1101"
      }
    ]
  }
]


import pandas as pd
from itertools import *

final_list = []
for val in l:
    d = {}
    d.update({key:val[key] for key in val.keys() if key not in ['variationProducts','productAttributes']})
    for prods,attrs in izip_longest(val['variationProducts'],val['productAttributes']):
        if prods:
            d.update(prods)
        if attrs:
            d.update({attrs['ID']:attrs['value']})
        final_list.append(d.copy())

pd.DataFrame(final_list)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM