Groupby pandas df 并创建一个带有嵌套字典的列

Question

Given this df:鉴于这个df：

        dim_date_id closing_type    r_d variable    value   rolling cusum_sample    sample_type
1330    1995-10-27      low         1     low      9.699377  0.039688   1   [sh_dummy_0.5, sh_dummy_1]
1331    1995-10-27      low         1    close    10.340971  0.044784   1   [sh_dummy_0.5, sh_dummy_1]
1330    1995-10-27      high        1    high     10.529675  0.062868   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331    1995-10-27      high        1    close    10.340971  0.044784   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330    1995-10-27      low         5     low      9.699377  0.132976   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331    1995-10-27      low         5   close     10.340971  0.188179   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330    1995-10-27      high        5    high     10.529675  0.184475   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]

I want to groupby it respect to variable and create a nest dictionary into the colum sample type (or a different one I don't really care).我想根据variable对它进行分组，并在列样本类型中创建一个嵌套字典（或者我并不关心的另一个字典）。 As output I would like a df that looks like this作为 output 我想要一个看起来像这样的 df

       dim_date_id      variable   value      sample_type
1330    1995-10-27       low      9.699377     {'r_d':1,'closing_type':'low','rolling':0.039688,'sample':[sh_dummy_0.5, sh_dummy_1]},
                                           {'r_d':5,'closing_type':'low','rolling':0.132976,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]

1331    1995-10-27      close    10.340971  {'r_d':1,'closing_type':'low','rolling':0.044784,'sample':[sh_dummy_0.5, sh_dummy_1]},
                                         {'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2], 
                                         {'r_d':5,'closing_type':'low','rolling':0.188179,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2],

1330    1995-10-27      high     10.529675    {'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]},
                                           {'r_d':5,'closing_type':'high','rolling':0.184475,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]

It has to be flexible as much as possible because in sample_type column sometimes there can be also 'n' different variables.它必须尽可能灵活，因为在 sample_type 列中有时也可能有“n”个不同的变量。

Answer 1

Try this:尝试这个：

new_df = df.groupby(['dim_date_id','variable','value']).apply(lambda x: x.to_dict()).reset_index(name='sample_type')

Output: Output：

>>> new_df
  dim_date_id variable      value                                        sample_type
0  1995-10-27    close  10.340971  {'dim_date_id': {1331: '1995-10-27'}, 'closing...
1  1995-10-27     high  10.529675  {'dim_date_id': {1330: '1995-10-27'}, 'closing...
2  1995-10-27      low   9.699377  {'dim_date_id': {1330: '1995-10-27'}, 'closing...

Groupby pandas df 并创建一个带有嵌套字典的列

问题描述

1 个解决方案

解决方案1
1 2021-12-20 19:20:25

Groupby pandas df 并创建一个带有嵌套字典的列

问题描述

1 个解决方案

解决方案1 1 2021-12-20 19:20:25

解决方案1
1 2021-12-20 19:20:25