![](/img/trans.png)
[英]Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array
[英]How to generate n-level hierarchical JSON from pandas DataFrame?
有沒有一種有效的方法來創建層次化JSON(深度為n級),其中父值是鍵而不是變量標簽? 即:
{"2017-12-31":
{"Junior":
{"Electronics":
{"A":
{"sales": 0.440755
}
},
{"B":
{"sales": -3.230951
}
}
}, ...etc...
}, ...etc...
}, ...etc...
1.我的測試DataFrame:
colIndex=pd.MultiIndex.from_product([['New York','Paris'],
['Electronics','Household'],
['A','B','C'],
['Junior','Senior']],
names=['City','Department','Team','Job Role'])
rowIndex=pd.date_range('25-12-2017',periods=12,freq='D')
df1=pd.DataFrame(np.random.randn(12, 24), index=rowIndex, columns=colIndex)
df1.index.name='Date'
df2=df1.resample('M').sum()
df3=df2.stack(level=0).groupby('Date').sum()
2.我正在進行的轉換,因為它似乎是從以下位置構建JSON的最合乎邏輯的結構:
df4=df3.stack(level=[0,1,2]).reset_index() \
.set_index(['Date','Job Role','Department','Team']) \
.sort_index()
3.我到目前為止的嘗試
我遇到了一個非常有用的SO問題 ,該問題使用以下代碼通過代碼解決了一層嵌套的問題:
j =(df.groupby(['ID','Location','Country','Latitude','Longitude'],as_index=False) \
.apply(lambda x: x[['timestamp','tide']].to_dict('r'))\
.reset_index()\
.rename(columns={0:'Tide-Data'})\
.to_json(orient='records'))
...但是我找不到找到嵌套.groupby()
的方法:
j=(df.groupby('date', as_index=True).apply(
lambda x: x.groupby('Job Role', as_index=True).apply(
lambda x: x.groupby('Department', as_index=True).apply(
lambda x: x.groupby('Team', as_index=True).to_dict()))) \
.reset_index().rename(columns={0:'sales'}).to_json(orient='records'))
您可以使用itertuples生成嵌套dict
,然后轉儲到json
。 為此,您需要將日期時間戳更改為string
df4=df3.stack(level=[0,1,2]).reset_index()
df4['Date'] = df4['Date'].dt.strftime('%Y-%m-%d')
df4 = df4.set_index(['Date','Job Role','Department','Team']) \
.sort_index()
創建嵌套的字典
def nested_dict():
return collections.defaultdict(nested_dict)
result = nested_dict()
使用itertuples
填充它
for row in df4.itertuples():
result[row.Index[0]][row.Index[1]][row.Index[2]][row.Index[3]]['sales'] = row._1
# print(row)
然后使用json
模塊將其轉儲。
import json
json.dumps(result)
'{“ 2017-12-31”:{“初級”:{“電子”:{“ A”:{“銷售”:-0.3947134370101142},“ B”:{“銷售”:-0.9873530754403204},“ C” :{“ sales”:-1.1182598058984508}},“家用”:{“ A”:{“ sales”:-1.1211850078098677},“ B”:{“ sales”:2.0330914483907847},“ C”:{“ sales”: 3.94762379718749}}},“高級”:{“電子”:{“ A”:{“銷售”:1.4528493451404196},“ B”:{“銷售”:-2.3277322345261005},“ C”:{“銷售”:- 2.8040263791743922}},“家庭”:{“ A”:{“銷售”:3.0972591929279663},“ B”:{“銷售”:9.8884565742502392},“ C”:{“銷售”:2.9359830722457576}}}},“ 2018 -01-31“:{” Junior“:{” Electronics“:{” A“:{” sales“:-1.580300149125217},” B“:{” sales“:1.414665000013205},” C“:{” sales“ :-1.432795129108244}},“家庭”:{“ A”:{“銷售”:2.7783259569115346},“ B”:{“銷售”:2.717700275321333},“ C”:{“銷售”:1.4358377416259644}}},“上級”:{“電子產品”:{“ A”:{“銷售”:2.8981726774941485},“ B”:{“銷售”:12.022897003654117},“ C”:{“銷售”:0.01776855733076088}},“家庭”: {“ A”:{“ sales”:-3.342163776613092} ,“ B”:{“ sales”:-5.283208386572307},“ C”:{“ sales”:2.942580121975619}}}}}'
我遇到了這個問題,並對OP設置的復雜性感到困惑。 這是一個最小的示例和解決方案(基於@MaartenFabré提供的答案)。
import collections
import pandas as pd
# build init DF
x = ['a', 'a']
y = ['b', 'c']
z = [['d'], ['e', 'f']]
df = pd.DataFrame(list(zip(x, y, z)), columns=['x', 'y', 'z'])
# x y z
# 0 a b [d]
# 1 a c [e, f]
設置正則,平面索引,然后使之成為多索引
# set flat index
df = df.set_index(['x', 'y'])
# set up multi index
df = df.reindex(pd.MultiIndex.from_tuples(zip(x, y)))
# z
# a b [d]
# c [e, f]
然后初始化一個嵌套字典,然后逐項填寫
nested_dict = collections.defaultdict(dict)
for keys, value in df.z.iteritems():
nested_dict[keys[0]][keys[1]] = value
# defaultdict(dict, {'a': {'b': ['d'], 'c': ['e', 'f']}})
此時,您可以JSON轉儲它,等等。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.