简体   繁体   English

熊猫数据框到嵌套字典

[英]Pandas dataframe to nested dictionary

Let's say my dataframe looks like this. 假设我的数据框看起来像这样。

date     app_id country val1 val2 val3 val4
2016-01-01  123 US       50   70   80   90
2016-01-02  123 US       60   80   90   100
2016-01-03  123 US       70   88   99   11

I want to dump this into a nested dictionary or even a JSON object as follows: 我想将其转储到嵌套字典甚至JSON对象中,如下所示:

{
   country:
   {
       app_id: 
       {
           date: [val1, val2, val3, val4]
       }
    }
}

So that way if I called my_dict['US'[123['2016-01-01']]] , I would get to the list [50,70,80,90] 这样,如果我叫my_dict['US'[123['2016-01-01']]] ,我会进入列表[50,70,80,90]

Is there an elegant way to go about doing this? 有没有一种优雅的方法可以做到这一点? I'm aware of Pandas's to_dict() function but I can't seem to get around nesting dictionaries. 我知道Pandas的to_dict()函数,但似乎无法绕过嵌套字典。

1st create the dataframe you need. 1创建所需的数据框。 then using recur_dictify from DSM. 然后使用DSM中的recur_dictify

dd=df.groupby(['country','app_id','date'],as_index=False)['val1', 'val2', 'val3', 'val4'].apply(lambda x : x.values.tolist()[0]).to_frame()

def recur_dictify(frame):
    if len(frame.columns) == 1:
        if frame.values.size == 1: return frame.values[0][0]
        return frame.values.squeeze()
    grouped = frame.groupby(frame.columns[0])
    d = {k: recur_dictify(g.iloc[:,1:]) for k,g in grouped}
    return d


recur_dictify(dd.reset_index())
Out[711]: 
{'US': {123: {'2016-01-01': [50, 70, 80, 90],
   '2016-01-02': [60, 80, 90, 100],
   '2016-01-03': [70, 88, 99, 11]}}}

update 更新

Actually this might work with a simple nested dictionary: 实际上,这可能适用于一个简单的嵌套字典:

import pandas as pd
from collections import defaultdict

nested_dict = lambda: defaultdict(nested_dict)
output = nested_dict()

for lst in df.values:
    output[lst[1]][lst[0]][lst[2]] = lst[3:].tolist()

Or: 要么:

output = defaultdict(dict)

for lst in df.values:
    try:
        output[lst[1]][lst[0]].update({lst[2]:lst[3:].tolist()})
    except KeyError:
        output[lst[1]][lst[0]] = {}
    finally:
        output[lst[1]][lst[0]].update({lst[2]:lst[3:].tolist()})

Or: 要么:

output = defaultdict(dict)

for lst in df.values:

    if output.get(lst[1], {}).get(lst[0]) == None:
        output[lst[1]][lst[0]] = {}        
    output[lst[1]][lst[0]].update({lst[2]:lst[3:].tolist()})

output

Here is my old solution, we make use df.groupby to group the dataframe by country and app_id. 这是我的旧解决方案,我们使用df.groupby按国家和app_id对数据df.groupby进行分组。 From here we collect the data (excluding country and app_id) and use defaultdict(dict) to add data to output dictionary in a nested way. 从这里我们收集数据(不包括国家和app_id),并使用defaultdict(dict)以嵌套的方式将数据添加到输出字典中。

import pandas as pd
from collections import defaultdict

output = defaultdict(dict)

groups = ["country","app_id"]
cols = [i for i in df.columns if i not in groups]

for i,subdf in df.groupby(groups):
    data = subdf[cols].set_index('date').to_dict("split") #filter away unwanted cols
    d = dict(zip(data['index'],data['data'])) 
    output[i[0]][i[1]] = d # assign country=level1, app_id=level2

output

return: 返回:

{'FR': {123: {'2016-01-01': [10, 20, 30, 40]}},
 'US': {123: {'2016-01-01': [50, 70, 80, 90],
   '2016-01-02': [60, 80, 90, 100],
   '2016-01-03': [70, 88, 99, 11]},
  124: {'2016-01-01': [10, 20, 30, 40]}}}

and output['US'][123]['2016-01-01'] return: output['US'][123]['2016-01-01']返回:

[50, 70, 80, 90]

if: 如果:

df = pd.DataFrame.from_dict({'app_id': {0: 123, 1: 123, 2: 123, 3: 123, 4: 124},
 'country': {0: 'US', 1: 'US', 2: 'US', 3: 'FR', 4: 'US'},
 'date': {0: '2016-01-01',
  1: '2016-01-02',
  2: '2016-01-03',
  3: '2016-01-01',
  4: '2016-01-01'},
 'val1': {0: 50, 1: 60, 2: 70, 3: 10, 4: 10},
 'val2': {0: 70, 1: 80, 2: 88, 3: 20, 4: 20},
 'val3': {0: 80, 1: 90, 2: 99, 3: 30, 4: 30},
 'val4': {0: 90, 1: 100, 2: 11, 3: 40, 4: 40}})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM