简体   繁体   English

Groupby pandas df 并创建一个带有嵌套字典的列

[英]Groupby pandas df and create a colum with nested dictionary

Given this df:鉴于这个df:

        dim_date_id closing_type    r_d variable    value   rolling cusum_sample    sample_type
1330    1995-10-27      low         1     low      9.699377  0.039688   1   [sh_dummy_0.5, sh_dummy_1]
1331    1995-10-27      low         1    close    10.340971  0.044784   1   [sh_dummy_0.5, sh_dummy_1]
1330    1995-10-27      high        1    high     10.529675  0.062868   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331    1995-10-27      high        1    close    10.340971  0.044784   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330    1995-10-27      low         5     low      9.699377  0.132976   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331    1995-10-27      low         5   close     10.340971  0.188179   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330    1995-10-27      high        5    high     10.529675  0.184475   1   [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]

I want to groupby it respect to variable and create a nest dictionary into the colum sample type (or a different one I don't really care).我想根据variable对它进行分组,并在列样本类型中创建一个嵌套字典(或者我并不关心的另一个字典)。 As output I would like a df that looks like this作为 output 我想要一个看起来像这样的 df

       dim_date_id      variable   value      sample_type
1330    1995-10-27       low      9.699377     {'r_d':1,'closing_type':'low','rolling':0.039688,'sample':[sh_dummy_0.5, sh_dummy_1]},
                                           {'r_d':5,'closing_type':'low','rolling':0.132976,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]

1331    1995-10-27      close    10.340971  {'r_d':1,'closing_type':'low','rolling':0.044784,'sample':[sh_dummy_0.5, sh_dummy_1]},
                                         {'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2], 
                                         {'r_d':5,'closing_type':'low','rolling':0.188179,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2],

1330    1995-10-27      high     10.529675    {'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]},
                                           {'r_d':5,'closing_type':'high','rolling':0.184475,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]

It has to be flexible as much as possible because in sample_type column sometimes there can be also 'n' different variables.它必须尽可能灵活,因为在 sample_type 列中有时也可能有“n”个不同的变量。

Try this:尝试这个:

new_df = df.groupby(['dim_date_id','variable','value']).apply(lambda x: x.to_dict()).reset_index(name='sample_type')

Output: Output:

>>> new_df
  dim_date_id variable      value                                        sample_type
0  1995-10-27    close  10.340971  {'dim_date_id': {1331: '1995-10-27'}, 'closing...
1  1995-10-27     high  10.529675  {'dim_date_id': {1330: '1995-10-27'}, 'closing...
2  1995-10-27      low   9.699377  {'dim_date_id': {1330: '1995-10-27'}, 'closing...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM