[英]Pandas: Flag groups and then change the data structure
Here is my raw data:这是我的原始数据:
raw_data = pd.DataFrame({'Year': [1991, 1991, 1991, 2000, 2000],
'ID': ['A', 'A', 'A', 'B', 'B',],
'Group': ['a', 'b', 'c', 'a', 'b'],
'score': [6252, 6252,6252, 2342, 2342]})
I need to generate three group columns indicating if the each ID belongs to that group.我需要生成三个组列,指示每个 ID 是否属于该组。 Pivot function can only change the data structure and achieves part of my goals.
Pivot function 只能改变数据结构,达到我的部分目的。
out_data = pd.DataFrame({'Year': [1991, 2000],
'Group a':['Yes','Yes'],
'Group b':['Yes','Yes'],
'Group c':['Yes','No'],
'ID': ['A', 'B'],
'score': [6252, 2342]})
This is a variant on a pivot_table
:这是
pivot_table
的一个变体:
(df
.pivot_table(index=['Year', 'ID'], columns='Group', values='score', aggfunc=any)
.replace({True: 'Yes'}).fillna('No')
.add_prefix('Group_')
.reset_index().rename_axis(columns=None)
)
(pd
.crosstab([df['Year'], df['ID']], df['Group'], values=df['score'], aggfunc=any)
.replace({True: 'Yes'}).fillna('No')
.add_prefix('Group_')
.reset_index().rename_axis(columns=None)
)
output: output:
Year ID Group_a Group_b Group_c
0 1991 A Yes Yes Yes
1 2000 B Yes Yes No
def function1(dd:pd.DataFrame):
return dd.assign(col1=1).pivot_table(index=['Year','ID','score'],columns='Group',values='col1')\
.add_prefix('Group ')
raw_data.groupby(['Year','ID']).apply(function1)\
.applymap(lambda x:"Yes" if pd.notna(x) else 'No')\
.droplevel([0,1]).reset_index()
out:出去:
Group Year ID score Group a Group b Group c
0 1991 A 6252 Yes Yes Yes
1 2000 B 2342 Yes Yes No
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.