简体   繁体   中英

Convert a dictionary of nested lists to a pandas DataFrame

I have a python dictionary as below:

dict1={808: [['a', 5.4, 'b'],
  ['c', 4.1 , 'b'],
  ['d', 3.7 , 'f']]} 

I want to convert it into a data frame as below:

memberid  userid score related
808       a      5.4     b
808       c      4.1     b
808       d      3.7     f

I tried with code below:

df=pd.DataFrame.from_dict(dict1,orient='index')

The results is not what I desired.

Could anybody know how to fix this? Thanks!

Let's convert each nested list value to a DataFrame, and then call pd.concat .

columns = ['userid', 'score', 'related']

df_dict = {k : pd.DataFrame(v, columns=columns) for k, v in dict1.items()}

df = (pd.concat(df_dict)
        .reset_index(level=1, drop=True)
        .rename_axis('memberid')
        .reset_index()
)

Or, in similar fashion—

df = pd.concat([
       pd.DataFrame(v, columns=columns, index=np.repeat(k, len(v))) 
       for k, v in dict1.items()
  ]
).rename_axis('memberid').reset_index()

df

   memberid userid  score related
0       808      a    5.4       b
1       808      c    4.1       b
2       808      d    3.7       f 

Important note—this solution also works for multiple key-value pairs, where each key may not have the same number of lists. But because of this flexibility, it may become slow for large DataFrames. In that case, the modified solution below works if dict1 contains just one entry—

k, v = list(dict1.items())[0]
pd.DataFrame(v, columns=columns, index=np.repeat(k, len(v))).reset_index()

   index userid  score related
0    808      a    5.4       b
1    808      c    4.1       b
2    808      d    3.7       f

Using pd.Series couple of times

df=pd.Series(dict1).apply(pd.Series).stack().apply(pd.Series).reset_index().drop('level_1',1)
df.columns=['memberid','userid', 'score', 'related']
df
Out[626]: 
   memberid userid  score related
0       808      a    5.4       b
1       808      c    4.1       b
2       808      d    3.7       f

Feeding your dictionary values into pd.DataFrame is one way.

Here we use next(iter(some_view)) syntax to extract the only key and only value.

This is an efficient solution where you can guarantee your dictionary only has one key and the value is a list of lists.

df = pd.DataFrame(next(iter(dict1.values())), columns=['userid', 'score', 'related'])\
       .assign(memberid=next(iter(dict1.keys())))

print(df)

  userid  score related  memberid
0      a    5.4       b       808
1      c    4.1       b       808
2      d    3.7       f       808

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM