My dictionary looks like this:
{'x': {'b': 10, 'c': 20}, 'y': {'b': '33', 'c': 44}}
I want to get a dataframe that looks like this:
index col1 col2 val
0 x b 10
1 x c 20
2 y b 33
3 y c 44
I tried calling pandas.from_dict(), but it did not give me the desired result. So, what is the most elegant, practical way to achieve this?
EDIT: In reality, my dictionary is of depth 4, so I'd like to see a solution for that case, or ideally, one that would work for arbitrary depth in a general setup.
Here is an example of a deeper dictionary: {'x':{'a':{'m':1, 'n':2}, 'b':{'m':10, 'n':20}}, 'y':{'a':{'m':100, 'n':200}, 'b':{'m':111, 'n':222}} }
The appropriate dataframe should have 8 rows.
ANSWER:
df = pd.DataFrame([(k1, k2, k3, k4, k5, v) for k1, k2345v in dict.items()
for k2, k345v in k2345v.items()
for k3, k45v in k345v.items()
for k4, k5v in k45v.items()
for k5, v in k5v.items()])
You can use a list comprehension to reorder your dict into a list of tuples where each tuple is a row and then you can sort your dataframe
import pandas as pd
d = {'x': {'b': 10, 'c': 20}, 'y': {'b': '33', 'c': 44}}
df = pd.DataFrame([(k,k1,v1) for k,v in d.items() for k1,v1 in v.items()], columns = ['Col1','Col2','Val'])
print df.sort(['Col1','Col2','Val'], ascending=[1,1,1])
Col1 Col2 Val
3 x b 10
2 x c 20
1 y b 33
0 y c 44
first create the df using from_dict
, then call stack
and reset_index
to get the shape you desire, you then need to rename the cols, sort and reset the index:
In [83]:
d={'x': {'b': 10, 'c': 20}, 'y': {'b': '33', 'c': 44}}
df = pd.DataFrame.from_dict(d, orient='index').stack().reset_index()
df.columns = ['col1', 'col2', 'val']
df.sort_values(['col1', 'col2'], inplace=True)
df.reset_index(drop=True, inplace=True)
df
Out[83]:
col1 col2 val
0 x b 10
1 x c 20
2 y b 33
3 y c 44
For any depth, you could use pd.json_normalize
and melt
. Below is an example with a slightly modified 2/3/4-deep dictionary
data = {'one': 1, 'two': {'a': 2}, 'four': {'a': {'b': {'c': 2}}},
'x': {'a': {'m': 1, 'n': 2}, 'b': {'m': 10, 'n': 20}},
'y': {'a': {'m': 100, 'n': 200}, 'b': {'m': 111, 'n': 222}}}
df_melt = pd.json_normalize(data, sep='>>').melt()
df_final = df_melt['variable'].str.split('>>', expand=True)
df_final.columns = [f'col{name}' for name in df_final.columns]
df_final[['value']] = df_melt['value']
col0 col1 col2 col3 value
0 one None None None 1
1 two a None None 2
2 four a b c 2
3 x a m None 1
4 x a n None 2
5 x b m None 10
6 x b n None 20
7 y a m None 100
8 y a n None 200
9 y b m None 111
10 y b n None 222
json_normalize
is really useful and there are some additional examples on Medium
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.