[英]Pandas create a df from dict of dict of lists
I've a data structure which is dynamically populated, so number of keys and sub-keys are unknown. 我有一个动态填充的数据结构,因此键和子键的数量未知。 I want to convert it into a Pandas df.
我想将其转换为Pandas df。 The structure looks like this
结构看起来像这样
datastore = {
"user1":{
"time1":[1,2,3,4],
"time2":[5,6,7,8],
"time3":[1,2,3,4] },
"user2":{
"time1":[1,2,3,4],
"time2":[5,6,7,8] }
}
A dict of dicts with lists for value 带有值列表的字典
I want to convert it into pandas df like this 我想像这样将其转换为pandas df
index users times x y z k
0 user1 time1 1 2 3 4
1 user1 time2 5 6 7 8
2 user1 time3 1 2 3 4
3 user2 time1 1 2 3 4
4 user2 time2 5 6 7 8
....
I've tried pd.DataFrame(dict), from_dict method but couldn't get it to work. 我尝试了pd.DataFrame(dict),from_dict方法,但是无法正常工作。 Any help would be appreciated.
任何帮助,将不胜感激。
EDIT: Sorry about the syntax error, fixed 编辑:对不起语法错误,已修复
Here's an approach 这是一种方法
datastore = {
"user1":{
"time1":[1,2,3,4],
"time2":[5,6,7,8],
"time3":[1,2,3,4] },
"user2":{
"time1":[1,2,3,4],
"time2":[5,6,7,8]}
}
We can use pd.DataFrame() with the dict then stack() it then reset_index() it 我们可以在字典中使用pd.DataFrame(),然后使用stack()然后使用reset_index()
df = pd.DataFrame(datastore).stack().reset_index()
print(df)
level_0 level_1 0
0 time1 user1 [1, 2, 3, 4]
1 time1 user2 [1, 2, 3, 4]
2 time2 user1 [5, 6, 7, 8]
3 time2 user2 [5, 6, 7, 8]
4 time3 user1 [1, 2, 3, 4]
Now we 'split' the list in 0 with an apply of pd.Series and then join that back to level_1 and level_2. 现在,应用pd.Series将列表“拆分”为0,然后将其重新连接到level_1和level_2。 Some column renaming and we're done
重命名某些列,我们完成了
df = df[['level_1', 'level_0']].join(df[0].apply(pd.Series))
df.columns = ['users', 'times', 'x', 'y', 'z', 'k']
print(df)
users times x y z k
0 user1 time1 1 2 3 4
1 user2 time1 1 2 3 4
2 user1 time2 5 6 7 8
3 user2 time2 5 6 7 8
4 user1 time3 1 2 3 4
Option 1 选项1
pd.DataFrame.from_dict(datastore, 'index').stack() \
.rename_axis(['users', 'times']) \
.apply(pd.Series, index=list('xyzk')).reset_index()
users times x y z k
0 user1 time1 1 2 3 4
1 user1 time2 5 6 7 8
2 user1 time3 1 2 3 4
3 user2 time1 1 2 3 4
4 user2 time2 5 6 7 8
Option 2 选项2
pd.DataFrame(
[[u, t] + l for u, td in datastore.items() for t, l in td.items()],
columns='users times x y z k'.split()
)
users times x y z k
0 user1 time1 1 2 3 4
1 user1 time2 5 6 7 8
2 user1 time3 1 2 3 4
3 user2 time1 1 2 3 4
4 user2 time2 5 6 7 8
Timing 定时
%timeit pd.DataFrame.from_dict(datastore, 'index').stack().rename_axis(['users', 'times']).apply(pd.Series, index=list('xyzk')).reset_index()
%timeit pd.DataFrame([[u, t] + l for u, td in datastore.items() for t, l in td.items()], columns='users timets x y z k'.split())
100 loops, best of 3: 2.72 ms per loop
1000 loops, best of 3: 556 µs per loop
DEBUG DEBUG
If you copy and paste this code... it should run. 如果您复制并粘贴此代码,它将运行。 Please try it and report back that it did run.
请尝试一下,并报告它确实已运行。
import pandas as pd
datastore = {
"user1":{
"time1":[1,2,3,4],
"time2":[5,6,7,8],
"time3":[1,2,3,4] },
"user2":{
"time1":[1,2,3,4],
"time2":[5,6,7,8]}
}
pd.DataFrame.from_dict(datastore, 'index').stack() \
.rename_axis(['users', 'times']) \
.apply(pd.Series, index=list('xyzk')).reset_index()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.