简体   繁体   English

Pandas:标志组,然后改变数据结构

[英]Pandas: Flag groups and then change the data structure

Here is my raw data:这是我的原始数据:

raw_data =  pd.DataFrame({'Year': [1991, 1991, 1991, 2000, 2000],
                          'ID': ['A', 'A', 'A', 'B', 'B',],
                          'Group': ['a', 'b', 'c', 'a', 'b'],
                          'score': [6252, 6252,6252, 2342, 2342]})

I need to generate three group columns indicating if the each ID belongs to that group.我需要生成三个组列,指示每个 ID 是否属于该组。 Pivot function can only change the data structure and achieves part of my goals. Pivot function 只能改变数据结构,达到我的部分目的。

out_data = pd.DataFrame({'Year': [1991, 2000],
             'Group a':['Yes','Yes'],
             'Group b':['Yes','Yes'],
             'Group c':['Yes','No'],
             'ID': ['A', 'B'],
             'score': [6252, 2342]})

This is a variant on a pivot_table :这是pivot_table的一个变体:

(df
 .pivot_table(index=['Year', 'ID'], columns='Group', values='score', aggfunc=any)
 .replace({True: 'Yes'}).fillna('No')
 .add_prefix('Group_')
 .reset_index().rename_axis(columns=None)
)

or crosstab :crosstab

(pd
 .crosstab([df['Year'], df['ID']], df['Group'], values=df['score'], aggfunc=any)
 .replace({True: 'Yes'}).fillna('No')
 .add_prefix('Group_')
 .reset_index().rename_axis(columns=None)
)

output: output:

   Year ID Group_a Group_b Group_c
0  1991  A     Yes     Yes     Yes
1  2000  B     Yes     Yes      No
def function1(dd:pd.DataFrame):
    return dd.assign(col1=1).pivot_table(index=['Year','ID','score'],columns='Group',values='col1')\
        .add_prefix('Group ')

raw_data.groupby(['Year','ID']).apply(function1)\
    .applymap(lambda x:"Yes" if pd.notna(x) else 'No')\
    .droplevel([0,1]).reset_index()

out:出去:

Group  Year ID  score Group a Group b Group c
0      1991  A   6252     Yes     Yes     Yes
1      2000  B   2342     Yes     Yes      No

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM