[英]Convert list of dicts of dict into DataFrame
I have a list of dictionaries of dictionary looks like: 我有一个字典字典列表,如下所示:
[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
and the result should looks like: 结果应如下所示:
a c d e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1
while the default pd.DataFrame(data)
looks like: 而默认的
pd.DataFrame(data)
看起来像:
a b f
0 1 {'c': 1, 'd': 2, 'e': 3} 4
1 2 {'c': 2, 'd': 3, 'e': 4} 3
2 3 {'c': 3, 'd': 4, 'e': 5} 2
3 4 {'c': 4, 'd': 5, 'e': 6} 1
How can I do this with pandas? 我该如何用熊猫呢? Thanks.
谢谢。
you need to convert json to flat data as such: 您需要像这样将json转换为平面数据:
import pandas as pd
from pandas.io.json import json_normalize
data = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
df = pd.DataFrame.from_dict(json_normalize(data), orient='columns')
df
# output:
a b.c b.d b.e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1
You can rename the columns once it's done.. 完成后,您可以重命名列。
json_normalize is what you're loooking for! json_normalize是您想要的!
import pandas as pd
from pandas.io.json import json_normalize
x = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
sep = '::::' # string that doesn't appear in column names
frame = json_normalize(x, sep=sep)
frame.columns = frame.columns.str.split(sep).str[-1]
print(frame)
Output 产量
a c d e f
0 1 1 2 3 4
1 2 2 3 4 3
2 3 3 4 5 2
3 4 4 5 6 1
import pandas as pd
z=[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4},
{'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3},
{'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2},
{'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
step1=pd.DataFrame(z)
column_with_sets = 'b'
step2=pd.DataFrame(list(step1[column_with_sets]))
step3=pd.concat([step1[[i for i in step1.columns if column_with_sets
not in i]], step2],1)
step4=output.reindex_axis(sorted(output.columns), axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.