简体   繁体   中英

Reshaping a Python dict to a Pandas dataframe

I gave a Python dictionary that looks something like:

myDict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

I would like to convert this to a Pandas dataframe that has the following structure:

  key val
0   B   f
1   B   g
2   B   h
3   A   a
4   A   b
5   A   c
6   A   d
7   A   e

I can't use the standard method to enter a dictionary into a Pandas dataframe but I have come up with a method that works but seems a but clunky. Basically, I create 2 lists containing the keys and the values, convert to a another dict and import that dict into a Pandas dataframe. I'm sure I could combine some of the lines using list or dictionary comprehensions but the final command would be unreadable. I've kept each line separate for the moment to make reading the code easier.

myDict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

# Column of keys
keys = [[k]*len(v) for k,v in myDict.items()]

# Flatten list
keys = [item for sublist in keys for item in sublist]

# Column of values
values = [v for k,v in myDict.items()]

# Flatten list
values = [item for sublist in values for item in sublist]

key = 'key'
value = 'val'

df = pd.DataFrame({key:keys,value:values})

print(df)

Perhaps I've been staring at this too long but my question is whether there is a simpler, built-in command to achieve the same goal.

You can simply iterate through the dict and then subsequently iterate through each of the lists:

df = pd.DataFrame(
    data=[(k, v) for k, l in myDict.items() for v in l],
    columns=['key', 'value']
)

You can do this using a one-liner:

pd.DataFrame.from_dict(myDict, orient='index')
    .stack()
    .reset_index()
    .drop('level_1', axis=1)
    .rename(columns={'level_0': 'key', 0: 'value'})

  key value
0   A     a
1   A     b
2   A     c
3   A     d
4   A     e
5   B     f
6   B     g
7   B     h

Using unnesting

df=unnesting(pd.Series(myDict).to_frame(),[0]).reset_index()
df.columns=['key','value']
df
Out[402]: 
  key value
0   A     a
1   A     b
2   A     c
3   A     d
4   A     e
5   B     f
6   B     g
7   B     h

One option that operates outside pandas but simplifies the operations a bit:

my_dict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

my_vals=[list(zip(k*len(v), v)) for k,v in my_dict.items()]

my_rows=[item for sublist in my_vals for item in sublist ]

pd.DataFrame(my_rows, columns=['key','val'])

    key val
0   A   a
1   A   b
2   A   c
3   A   d
4   A   e
5   B   f
6   B   g
7   B   h

You can also try using:

import pandas as pd

myDict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

# Creating dataframe from the dictionary 
df = pd.DataFrame.from_dict(myDict, orient='index')

# Transpose to change keys to column names
# stack creates level by pivoting columns
# reset index and only select two columns
df = df.T.stack().reset_index().iloc[:,1:]

# renaming columns
df.columns = ['key', 'value']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM