简体   繁体   English

使用分组依据的数据透视熊猫数据框

[英]Pivot pandas dataframe using group by

I have a dataframe which is like this: 我有一个像这样的数据框:

id  sub_id  count
0   94  1
1   94  9
1   315 7
2   94  4
2   265 1


data = {'id': [0,1,1,2,2], 
     'sub_id': [94,94,315,94,265], 
     'count': [1,9,7,4,1]
    }
df = pd.DataFrame(data)

And I want it in the following form:
id sub_id1 count_sub_id1 sub_id2 count_sub_id2
0  94      1             NaN     NaN
1  94      9             315     7
2  94      4             265     1

Note: Here, every id can have either can have a maximum of two rows, each with different sub_id and their counts. 注意:在这里,每个id最多可以包含两行,每行具有不同的sub_id及其计数。

I tried this df.pivot(index='id',columns='sub_id',values='count') but this is causing all rows in the second column to be expanded as different columns, whereas I only need two columns, with a custom name, ie. 我尝试了这个df.pivot(index='id',columns='sub_id',values='count')但这导致第二列中的所有行都被扩展为不同的列,而我只需要两列自定义名称,即 only those two rows which exist for each group of id s 每组id s仅存在两行

Try using: 尝试使用:

df_out = (df.set_index(['id', df.groupby('id').cumcount()+1])
            .unstack().sort_index(level=1, axis=1))

df_out.columns = [f'{i}{j}' if i == "sub_id" else f'{i}_sub_id{j}' 
                          for i, j in df_out.columns]

print(df_out.reset_index())

Output: 输出:

   id  count_sub_id1  sub_id1  count_sub_id2  sub_id2
0   0            1.0     94.0            NaN      NaN
1   1            9.0     94.0            7.0    315.0
2   2            4.0     94.0            1.0    265.0
output_df = pd.concat([df.groupby('id')['sub_id'].apply(list).apply(pd.Series),
                   df.groupby('id')['count'].apply(list).apply(pd.Series)], axis =1)

output_df.columns = ['sub_id1', 'sub_id2', 'count_sub_id1', 'count_sub_id2']

>>>output_df

        sub_id1 sub_id2 count_sub_id1   count_sub_id2
0       94.0    NaN     1.0             NaN
1       94.0    315.0   9.0             7.0
2       94.0    265.0   4.0            1.0

Here's another way: 这是另一种方式:

df_out = (df.groupby('id')
   .apply(lambda x: x.reset_index(drop=True).head(2))
   .drop('id', axis=1)
   .unstack()
)

Output: 输出:

   sub_id        count     
        0      1     0    1
id                         
0    94.0    NaN   1.0  NaN
1    94.0  315.0   9.0  7.0
2    94.0  265.0   4.0  1.0

To rename: 重命名:

df_out.columns = [f'{i}{j+1} for i,j in df_out.columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM