简体   繁体   中英

Turning a nested dictionary into a pandas dataframe

I have a dictionary that looks like this:

{'136454': [{'city': 'Kabul', 'country': 'AF'}],
 '137824': [{'city': 'Kabul', 'country': 'AF'}],
 '134134': [{'city': 'Kabul', 'country': 'AF'}],
 '138322': [{'city': 'Fujairah', 'country': 'AE'},
  {'city': 'Kabul', 'country': 'AF'}],
 '137246': [{'city': 'Fujairah', 'country': 'AE'},
  {'city': 'Kabul', 'country': 'AF'}, {'city': 'New Delhi', 'country': 'IN'],
 '133141': [{'city': 'Kabul', 'country': 'AF'}]}

What I would like is a dataframe that looks like this:

'136454' | 'Kabul'|'AF'
'137824' | 'Kabul'|'AF'
'134134' | 'Kabul'|'AF'
'138322' |'Fujairah'| 'AE'
'138322'  | 'Kabul'| 'AF'
'137246' | 'Fujairah'| 'AE'
'137246' | 'Kabul' | 'AE'
'137246' | 'New Delhi'| 'IN'
'133141'| 'Kabul'| 'AF'

What I'm getting at the moment is only the first value for each key. Not very good at pandas, so a bit confused.

Let us do explode Notice this function avaliable after pandas 0.25

df=pd.Series(d).explode().apply(pd.Series)

Iterate through the dictionary, appending the main key to the internal dict, and finally create your dataframe:

d = []
for k,v in data.items():
    for ent in v:
        #this is where you append the main key to the internal dictionary
        ent.update({"key":k})
        d.append(ent)

#get your dataframe 
pd.DataFrame(d)

    city      country   key
0   Kabul       AF     136454
1   Kabul       AF     137824
2   Kabul       AF     134134
3   Fujairah    AE     138322
4   Kabul       AF     138322
5   Fujairah    AE     137246
6   Kabul       AF     137246
7   New Delhi   IN     137246
8   Kabul       AF     133141

Another possible solution, you can "flat" you dict

data = {'136454': [{'city': 'Kabul', 'country': 'AF'}],
        '137824': [{'city': 'Kabul', 'country': 'AF'}],
        '134134': [{'city': 'Kabul', 'country': 'AF'}],
        '138322': [{'city': 'Fujairah', 'country': 'AE'},
                   {'city': 'Kabul', 'country': 'AF'}],
        '137246': [{'city': 'Fujairah', 'country': 'AE'},
                   {'city': 'Kabul', 'country': 'AF'},
                   {'city': 'New Delhi', 'country': 'IN'}],
        '133141': [{'city': 'Kabul', 'country': 'AF'}]}


new_data = []
for key, value in data.items():
    for arr_value in value:
        arr_value['id'] = key
        new_data.append(arr_value)

print(new_data)

df = pd.DataFrame.from_dict(new_data)

print(df.head())

You can use a list comprehension and then pass to pd.DataFrame :

import pandas as pd
d = {'136454': [{'city': 'Kabul', 'country': 'AF'}], '137824': [{'city': 'Kabul', 'country': 'AF'}], '134134': [{'city': 'Kabul', 'country': 'AF'}], '138322': [{'city': 'Fujairah', 'country': 'AE'}, {'city': 'Kabul', 'country': 'AF'}], '137246': [{'city': 'Fujairah', 'country': 'AE'}, {'city': 'Kabul', 'country': 'AF'}, {'city': 'New Delhi', 'country': 'IN'}], '133141': [{'city': 'Kabul', 'country': 'AF'}]}
data = [[a, i['city'], i['country']] for a, b in d.items() for i in b]

>>> pd.DataFrame(data)

Output:

       0          1   2
0  136454      Kabul  AF
1  137824      Kabul  AF
2  134134      Kabul  AF
3  138322   Fujairah  AE
4  138322      Kabul  AF
5  137246   Fujairah  AE
6  137246      Kabul  AF
7  137246  New Delhi  IN
8  133141      Kabul  AF

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM