Create a dataframe from a dict where values are variable-length lists

Question

I have a dict where the values are is a list, for example;

my_dict = {1: [964725688, 6928857],
           ...

           22: [1667906, 35207807, 685530997, 35207807],
           ...
           }

In this example, the max items in a list is 4, but it could be greater than that.

I would like to convert it to a dataframe like:

1  964725688
1  6928857
...
22 1667906
22 35207807
22 685530997
22 35207807

Answer 1

my_dict ={1: [964725688, 6928857], 22: [1667906, 35207807, 685530997, 35207807]}

df = pd.DataFrame( [ [k,ele] for k,v in my_dict.iteritems() for ele in v ])

print df

   0   1        
0   1  964725688
1   1    6928857
2  22    1667906
3  22   35207807
4  22  685530997
5  22   35207807

Answer 2

First Idea
pandas

s = pd.Series(my_dict)
pd.Series(
    np.concatenate(s.values),
    s.index.repeat(s.str.len())
)

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

Faster!
numpy

values = list(my_dict.values())
lens = [len(value) for value in values]
keys = list(my_dict.keys())
pd.Series(np.concatenate(values), np.repeat(keys, lens))

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

Interesting
pd.concat

pd.concat({k: pd.Series(v) for k, v in my_dict.items()}).reset_index(1, drop=True)

1     964725688
1       6928857
22      1667906
22     35207807
22    685530997
22     35207807
dtype: int64

Answer 3

Slightly on the functional side using zip and reduce :

from functools import reduce  # if working with Python3
import pandas as pd


d = {1: [964725688, 6928857], 22: [1667906, 35207807, 685530997, 35207807]}

df = pd.DataFrame(reduce(lambda x,y: x+y, [list(zip([k]*len(v), v)) for k,v in d.items()]))

print(df)

#     0          1
# 0   1  964725688
# 1   1    6928857
# 2  22    1667906
# 3  22   35207807
# 4  22  685530997
# 5  22   35207807

We zip the keys and the values to create records (extended through a reduce operation). The records are then passed to the pd.DataFrame function.

I hope this helps.

Answer 4

#Load dict directly to a Dataframe without loops
df=pd.DataFrame.from_dict(my_dict,orient='index')

#Unstack, drop na and sort if you need.
df.unstack().dropna().sort_index(level=1)
Out[382]: 
0  1     964725688.0
1  1       6928857.0
0  22      1667906.0
1  22     35207807.0
2  22    685530997.0
3  22     35207807.0
dtype: float64

Create a dataframe from a dict where values are variable-length lists

Question

4 answers

solution1
3 2017-05-11 18:41:34

solution2
2 2017-05-11 18:46:13

solution3
1 2017-05-11 19:13:28

solution4
1 ACCPTED 2017-05-11 21:36:34

Create a dataframe from a dict where values are variable-length lists

Question

4 answers

solution1 3 2017-05-11 18:41:34

solution2 2 2017-05-11 18:46:13

solution3 1 2017-05-11 19:13:28

solution4 1 ACCPTED 2017-05-11 21:36:34

solution1
3 2017-05-11 18:41:34

solution2
2 2017-05-11 18:46:13

solution3
1 2017-05-11 19:13:28

solution4
1 ACCPTED 2017-05-11 21:36:34