[英]Pandas convert Dataframe to Nested Json
我的问题基本上与这个问题相反:
从深度嵌套的 JSON 创建 Pandas DataFrame
我想知道是否有可能做相反的事情。 给定一个表,如:
Library Level School Major 2013 Total
200 MS_AVERY UGRAD GENERAL STUDIES GEST 5079
201 MS_AVERY UGRAD GENERAL STUDIES HIST 5
202 MS_AVERY UGRAD GENERAL STUDIES MELC 2
203 MS_AVERY UGRAD GENERAL STUDIES PHIL 10
204 MS_AVERY UGRAD GENERAL STUDIES PHYS 1
205 MS_AVERY UGRAD GENERAL STUDIES POLS 53
是否可以生成嵌套的 dict(或 JSON),例如:
字典:
{'MS_AVERY':
{ 'UGRAD' :
{'GENERAL STUDIES' : {'GEST' : 5}
{'MELC' : 2}
...
给定您的DataFrame
对象,创建一个函数来构建递归字典似乎并不难:
def fdrec(df):
drec = dict()
ncols = df.values.shape[1]
for line in df.values:
d = drec
for j, col in enumerate(line[:-1]):
if not col in d.keys():
if j != ncols-2:
d[col] = {}
d = d[col]
else:
d[col] = line[-1]
else:
if j!= ncols-2:
d = d[col]
return drec
这将产生:
{'MS_AVERY':
{'UGRAD':
{'GENERAL STUDIES': {'PHYS': 1L,
'POLS': 53L,
'PHIL': 10L,
'HIST': 5L,
'MELC': 2L,
'GEST': 5079L}}}}
这是我在解决这个问题时提出的解决方案:
def rollup_to_dict_core(x, values, columns, d_columns=None):
if d_columns is None:
d_columns = []
if len(columns) == 1:
if len(values) == 1:
return x.set_index(columns)[values[0]].to_dict()
else:
return x.set_index(columns)[values].to_dict(orient='index')
else:
res = x.groupby([columns[0]] + d_columns).apply(lambda y: rollup_to_dict_core(y, values, columns[1:]))
if len(d_columns) == 0:
return res.to_dict()
else:
res.name = columns[1]
res = res.reset_index(level=range(1, len(d_columns) + 1))
return res.to_dict(orient='index')
def rollup_to_dict(x, values, d_columns=None):
if d_columns is None:
d_columns = []
columns = [c for c in x.columns if c not in values and c not in d_columns]
return rollup_to_dict_core(x, values, columns, d_columns)
>>> pprint(rollup_to_dict(df, ['2013 Total']))
{'MS_AVERY': {'UGRAD': {'GENERAL STUDIES': {'GEST': 5079,
'HIST': 5,
'MELC': 2,
'PHIL': 10,
'PHYS': 1,
'POLS': 53}}}}
key = ['Library', 'Level', 'School']
series = (df.groupby(key, sort=False)[df.columns.difference(key)]
.apply(lambda x: x[['Major', '2013 Total']].to_dict('records'))
)
# build: {Major: Total}
major = {}
values = series.values[0]
for i in range(len(values)):
major.update({values[i]['Major']: values[i]['2013 Total']})
# build the recursive dictionary
index = series.index[0]
d = {}
for i in reversed(range(len(index))):
if not bool(d):
d = {index[i]: major}
else:
d = {index[i]: d}
print(json.dumps(d, indent=2))
它将产生:
{
"MS_AVERY": {
"UGRAD": {
"GENERAL STUDIES": {
"GEST": 5079,
"HIST": 5,
"MELC": 2,
"PHIL": 10,
"PHYS": 1,
"POLS": 53
}
}
}
}
这是生成这种格式的通用方法,可能是其他人正在寻找的。 所需格式:
{ "data":
[
{
"NAME": [1, 2, 3]
},
{
"NAME": [1, 2, 3]
},
]
}
要得到这个:
import json
jsonstr = '{"data":['
for (columnName, columnData) in df.iteritems():
jsonstr+='{"'
jsonstr+=columnName
jsonstr+='":'
jsonstr+=json.dumps(list(columnData.values))
jsonstr+='},'
jsonstr = jsonstr[:-1]
jsonstr+=']}'
jsonobject = json.loads(jsonstr)
jsonobject
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.