简体   繁体   中英

dataframe from dictionary of lists

Is there a way to achieve following with purely pandas methods or is it actually more reasonable to rearrange the dictionary itself first?

Initial dictionary:

data_json = {'a':[{'aa':1,'bb':2,'cc':3},
                  {'aa':2,'bb':2,'cc':3},
                  {'aa':3,'bb':2,'cc':3}],
             'b':[{'beta':22,'alpha':23,'gamma':24},
                  {'gamma':24,'beta':25,'alpha':26},
                  {'alpha':34,'beta':35,'gamma':36}]}

And I would like to get a dataframe where column names would be nested dictionary keys:

  aa bb cc alpha beta gamma
1  1  2  3    23   22    24
2  2  2  3    26   25    24
3  3  2  3    34   35    36

Trying:

aaa = pd.DataFrame(data_json)
foo = lambda x: pd.Series([i for i in x.items()])
bbb=pd.concat([aaa['a'].apply(foo),aaa['b'].apply(foo)],axis=1)

Gives me

   0  1  2     0    1     2
1  1  2  3    23   22    24
2  2  2  3    26   25    24
3  3  2  3    34   35    36

But now I'm stuck because the column names are duplicated [0,1,2,0,1,2] and I cannot use just the

bbb.rename(columns={0:'a',1:'b',...})

As I said I do not mind reordering the initial dictionary, but I'd like the whole thing be as clean as possible.

I would load both 'a' and 'b' separately and join them (merge them on index):

pd.DataFrame(data_json['a']).join(pd.DataFrame(data_json['b']))


   aa  bb  cc  alpha  beta  gamma
0   1   2   3     23    22     24
1   2   2   3     26    25     24
2   3   2   3     34    35     36

Another way with a loop in case you don't know how many data_json.keys() you have, then using pd.concat since it's more convenient with a list. Note that I'm using sorted(data_json) so I can get a before b :

list_df = []
for k in sorted(data_json):
    list_df.append(pd.DataFrame(data_json[k]))
pd.concat(list_df, axis=1)

I would use concat . Note:

In [11]: pd.DataFrame(data_json['a'])
Out[11]: 
   aa  bb  cc
0   1   2   3
1   2   2   3
2   3   2   3

In [12]: pd.DataFrame(data_json['b'])
Out[12]: 
   alpha  beta  gamma
0     23    22     24
1     26    25     24
2     34    35     36

So simply:

In [13]: pd.concat((pd.DataFrame(v) for v in data_json.values()), axis=1)
Out[13]: 
   alpha  beta  gamma  aa  bb  cc
0     23    22     24   1   2   3
1     26    25     24   2   2   3
2     34    35     36   3   2   3

In [14]: 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM