简体   繁体   English

熊猫根据列表的字典创建df

[英]Pandas create a df from dict of dict of lists

I've a data structure which is dynamically populated, so number of keys and sub-keys are unknown. 我有一个动态填充的数据结构,因此键和子键的数量未知。 I want to convert it into a Pandas df. 我想将其转换为Pandas df。 The structure looks like this 结构看起来像这样

    datastore = {
    "user1":{
        "time1":[1,2,3,4], 
        "time2":[5,6,7,8], 
        "time3":[1,2,3,4] },
    "user2":{ 
        "time1":[1,2,3,4], 
        "time2":[5,6,7,8] }
}

A dict of dicts with lists for value 带有值列表的字典

I want to convert it into pandas df like this 我想像这样将其转换为pandas df

index users times x y z k
0     user1 time1 1 2 3 4
1     user1 time2 5 6 7 8
2     user1 time3 1 2 3 4
3     user2 time1 1 2 3 4
4     user2 time2 5 6 7 8 
....

I've tried pd.DataFrame(dict), from_dict method but couldn't get it to work. 我尝试了pd.DataFrame(dict),from_dict方法,但是无法正常工作。 Any help would be appreciated. 任何帮助,将不胜感激。

EDIT: Sorry about the syntax error, fixed 编辑:对不起语法错误,已修复

Here's an approach 这是一种方法

datastore = {
"user1":{
    "time1":[1,2,3,4], 
    "time2":[5,6,7,8], 
    "time3":[1,2,3,4] },
"user2":{ 
    "time1":[1,2,3,4], 
    "time2":[5,6,7,8]}
}

We can use pd.DataFrame() with the dict then stack() it then reset_index() it 我们可以在字典中使用pd.DataFrame(),然后使用stack()然后使用reset_index()

df = pd.DataFrame(datastore).stack().reset_index()
print(df)
  level_0 level_1             0
0   time1   user1  [1, 2, 3, 4]
1   time1   user2  [1, 2, 3, 4]
2   time2   user1  [5, 6, 7, 8]
3   time2   user2  [5, 6, 7, 8]
4   time3   user1  [1, 2, 3, 4]

Now we 'split' the list in 0 with an apply of pd.Series and then join that back to level_1 and level_2. 现在,应用pd.Series将列表“拆分”为0,然后将其重新连接到level_1和level_2。 Some column renaming and we're done 重命名某些列,我们完成了

df = df[['level_1', 'level_0']].join(df[0].apply(pd.Series))
df.columns = ['users', 'times', 'x', 'y', 'z', 'k']
print(df)
   users  times  x  y  z  k
0  user1  time1  1  2  3  4
1  user2  time1  1  2  3  4
2  user1  time2  5  6  7  8
3  user2  time2  5  6  7  8
4  user1  time3  1  2  3  4

Option 1 选项1

pd.DataFrame.from_dict(datastore, 'index').stack() \
    .rename_axis(['users', 'times']) \
    .apply(pd.Series, index=list('xyzk')).reset_index()

   users  times  x  y  z  k
0  user1  time1  1  2  3  4
1  user1  time2  5  6  7  8
2  user1  time3  1  2  3  4
3  user2  time1  1  2  3  4
4  user2  time2  5  6  7  8

Option 2 选项2

pd.DataFrame(
    [[u, t] + l for u, td in datastore.items() for t, l in td.items()],
    columns='users times x y z k'.split()
)

   users  times  x  y  z  k
0  user1  time1  1  2  3  4
1  user1  time2  5  6  7  8
2  user1  time3  1  2  3  4
3  user2  time1  1  2  3  4
4  user2  time2  5  6  7  8

Timing 定时

%timeit pd.DataFrame.from_dict(datastore, 'index').stack().rename_axis(['users', 'times']).apply(pd.Series, index=list('xyzk')).reset_index()
%timeit pd.DataFrame([[u, t] + l for u, td in datastore.items() for t, l in td.items()], columns='users timets x y z k'.split())

100 loops, best of 3: 2.72 ms per loop
1000 loops, best of 3: 556 µs per loop

DEBUG DEBUG
If you copy and paste this code... it should run. 如果您复制并粘贴此代码,它将运行。 Please try it and report back that it did run. 请尝试一下,并报告它确实已运行。

import pandas as pd

datastore = {
    "user1":{
        "time1":[1,2,3,4], 
        "time2":[5,6,7,8], 
        "time3":[1,2,3,4] },
    "user2":{ 
        "time1":[1,2,3,4], 
        "time2":[5,6,7,8]}
}

pd.DataFrame.from_dict(datastore, 'index').stack() \
    .rename_axis(['users', 'times']) \
    .apply(pd.Series, index=list('xyzk')).reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM