[英]JSON to Python Pandas dataframe
我是 Python 新手,一直專注於學習 Pandas 和 xlxswriter 以幫助自動化一些工作流程。 我附上了一個我可以訪問的 JSON 文件片段,但無法轉換為 Pandas 數據幀。
如果我使用pd.read_json(filename)
:它會通過將它們的內容集中在一個單元格中來pd.read_json(filename)
變體產品和產品pd.read_json(filename)
。
問題:我將如何獲取這個 JSON 文件並使其看起來像底部的 Pandas 數據幀輸出:
[
{
"ID": "12345",
"productName": "Product A ",
"minPrice": "$89.00",
"maxPrice": "$89.00",
"variationProducts": [
{
"variantColor": "JJ0BVE7",
"variantSize": "080",
"sellingPrice": "$89.00",
"inventory": 3,
},
{
"variantColor": "JJ0BVE7",
"variantSize": "085",
"sellingPrice": "$89.00",
"inventory": 6,
}
],
"productAttributes": [
{
"ID": "countryOfOrigin",
"value": "Imported"
},
{
"ID": "csProductCode",
"value": "1100"
}
]
},
{
"ID": "23456",
"productName": "Product B",
"minPrice": "$29.99",
"maxPrice": "$69.00",
"variationProducts": [
{
"variantColor": "JJ169Q0",
"variantSize": "050",
"sellingPrice": "$69.00",
"inventory": 55,
},
{
"variantColor": "JJ123Q0",
"variantSize": "055",
"sellingPrice": "$69.00",
"inventory": 5,
}
],
"productAttributes": [
{
"ID": "countryOfOrigin",
"value": "Imported"
},
{
"ID": "csProductCode",
"value": "1101"
}
]
}
]
我在 excel 中制作了這個示例輸出,variationProducts 在 variantColor 級別匯總 - 因此對於產品 A,庫存是兩個變體的總和,盡管它們具有不同的 variantSizes:
ID productName maxPrice minPrice countryOfOrigin csProductCode variantColor inventory
12345 Product A $89 $89 Imported 1100 JJ0BVE7 9
23456 Product B $69 $30 Imported 1101 JJ169Q0 55
23456 Product B $69 $30 Imported 1101 JJ123Q0 5
您可以使用json_normalize
:
In [11]: pd.io.json.json_normalize(d, "variationProducts", ["ID", "maxPrice", "minPrice", "productName"], record_prefix=".")
Out[11]:
.inventory .sellingPrice .variantColor .variantSize ID maxPrice minPrice productName
0 3 $89.00 JJ0BVE7 080 12345 $89.00 $89.00 Product A
1 6 $89.00 JJ0BVE7 085 12345 $89.00 $89.00 Product A
2 55 $69.00 JJ169Q0 050 23456 $69.00 $29.99 Product B
3 5 $69.00 JJ123Q0 055 23456 $69.00 $29.99 Product B
In [12]: pd.io.json.json_normalize(d, "productAttributes", ["ID", "maxPrice", "minPrice", "productName"], record_prefix=".")
Out[12]:
.ID .value ID maxPrice minPrice productName
0 countryOfOrigin Imported 12345 $89.00 $89.00 Product A
1 csProductCode 1100 12345 $89.00 $89.00 Product A
2 countryOfOrigin Imported 23456 $69.00 $29.99 Product B
3 csProductCode 1101 23456 $69.00 $29.99 Product B
然后,您可以將這兩個加入/合並在一起...
我認為您必須對數據進行少量解析才能將其轉換為正確的格式,以便 read_json 正常工作。
首先使用 json.load(file_name) 將 json 數據放入一個將成為列表的 python 對象中。
現在您需要轉換這個列表,使每個對象都是字典,每個字典都有鍵作為列名,值作為您想要在該列中的值。
一個你已經准備好了列表然后你可以使用pandas.DataFrame(list)
l = [
{
"ID": "12345",
"productName": "Product A ",
"minPrice": "$89.00",
"maxPrice": "$89.00",
"variationProducts": [
{
"variantColor": "JJ0BVE7",
"variantSize": "080",
"sellingPrice": "$89.00",
"inventory": 3,
},
{
"variantColor": "JJ0BVE7",
"variantSize": "085",
"sellingPrice": "$89.00",
"inventory": 6,
}
],
"productAttributes": [
{
"ID": "countryOfOrigin",
"value": "Imported"
},
{
"ID": "csProductCode",
"value": "1100"
}
]
},
{
"ID": "23456",
"productName": "Product B",
"minPrice": "$29.99",
"maxPrice": "$69.00",
"variationProducts": [
{
"variantColor": "JJ169Q0",
"variantSize": "050",
"sellingPrice": "$69.00",
"inventory": 55,
},
{
"variantColor": "JJ123Q0",
"variantSize": "055",
"sellingPrice": "$69.00",
"inventory": 5,
}
],
"productAttributes": [
{
"ID": "countryOfOrigin",
"value": "Imported"
},
{
"ID": "csProductCode",
"value": "1101"
}
]
}
]
import pandas as pd
from itertools import *
final_list = []
for val in l:
d = {}
d.update({key:val[key] for key in val.keys() if key not in ['variationProducts','productAttributes']})
for prods,attrs in izip_longest(val['variationProducts'],val['productAttributes']):
if prods:
d.update(prods)
if attrs:
d.update({attrs['ID']:attrs['value']})
final_list.append(d.copy())
pd.DataFrame(final_list)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.