[英]Groupby pandas df and create a colum with nested dictionary
Given this df:鉴于这个df:
dim_date_id closing_type r_d variable value rolling cusum_sample sample_type
1330 1995-10-27 low 1 low 9.699377 0.039688 1 [sh_dummy_0.5, sh_dummy_1]
1331 1995-10-27 low 1 close 10.340971 0.044784 1 [sh_dummy_0.5, sh_dummy_1]
1330 1995-10-27 high 1 high 10.529675 0.062868 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331 1995-10-27 high 1 close 10.340971 0.044784 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330 1995-10-27 low 5 low 9.699377 0.132976 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331 1995-10-27 low 5 close 10.340971 0.188179 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1330 1995-10-27 high 5 high 10.529675 0.184475 1 [sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
I want to groupby it respect to variable
and create a nest dictionary into the colum sample type (or a different one I don't really care).我想根据
variable
对它进行分组,并在列样本类型中创建一个嵌套字典(或者我并不关心的另一个字典)。 As output I would like a df that looks like this作为 output 我想要一个看起来像这样的 df
dim_date_id variable value sample_type
1330 1995-10-27 low 9.699377 {'r_d':1,'closing_type':'low','rolling':0.039688,'sample':[sh_dummy_0.5, sh_dummy_1]},
{'r_d':5,'closing_type':'low','rolling':0.132976,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
1331 1995-10-27 close 10.340971 {'r_d':1,'closing_type':'low','rolling':0.044784,'sample':[sh_dummy_0.5, sh_dummy_1]},
{'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2],
{'r_d':5,'closing_type':'low','rolling':0.188179,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2],
1330 1995-10-27 high 10.529675 {'r_d':1,'closing_type':'high','rolling':0.062868,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]},
{'r_d':5,'closing_type':'high','rolling':0.184475,'sample':[sh_dummy_0.5, sh_dummy_1, sh_dummy_2]
It has to be flexible as much as possible because in sample_type column sometimes there can be also 'n' different variables.它必须尽可能灵活,因为在 sample_type 列中有时也可能有“n”个不同的变量。
Try this:尝试这个:
new_df = df.groupby(['dim_date_id','variable','value']).apply(lambda x: x.to_dict()).reset_index(name='sample_type')
Output: Output:
>>> new_df
dim_date_id variable value sample_type
0 1995-10-27 close 10.340971 {'dim_date_id': {1331: '1995-10-27'}, 'closing...
1 1995-10-27 high 10.529675 {'dim_date_id': {1330: '1995-10-27'}, 'closing...
2 1995-10-27 low 9.699377 {'dim_date_id': {1330: '1995-10-27'}, 'closing...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.