使用分组依据的数据透视熊猫数据框

Question

I have a dataframe which is like this: 我有一个像这样的数据框：

id  sub_id  count
0   94  1
1   94  9
1   315 7
2   94  4
2   265 1


data = {'id': [0,1,1,2,2], 
     'sub_id': [94,94,315,94,265], 
     'count': [1,9,7,4,1]
    }
df = pd.DataFrame(data)

And I want it in the following form:
id sub_id1 count_sub_id1 sub_id2 count_sub_id2
0  94      1             NaN     NaN
1  94      9             315     7
2  94      4             265     1

Note: Here, every id can have either can have a maximum of two rows, each with different sub_id and their counts. 注意：在这里，每个id最多可以包含两行，每行具有不同的sub_id及其计数。

I tried this df.pivot(index='id',columns='sub_id',values='count') but this is causing all rows in the second column to be expanded as different columns, whereas I only need two columns, with a custom name, ie. 我尝试了这个df.pivot(index='id',columns='sub_id',values='count')但这导致第二列中的所有行都被扩展为不同的列，而我只需要两列自定义名称，即 only those two rows which exist for each group of id s 每组id s仅存在两行

Answer 1

Try using: 尝试使用：

df_out = (df.set_index(['id', df.groupby('id').cumcount()+1])
            .unstack().sort_index(level=1, axis=1))

df_out.columns = [f'{i}{j}' if i == "sub_id" else f'{i}_sub_id{j}' 
                          for i, j in df_out.columns]

print(df_out.reset_index())

Output: 输出：

   id  count_sub_id1  sub_id1  count_sub_id2  sub_id2
0   0            1.0     94.0            NaN      NaN
1   1            9.0     94.0            7.0    315.0
2   2            4.0     94.0            1.0    265.0

Answer 2

output_df = pd.concat([df.groupby('id')['sub_id'].apply(list).apply(pd.Series),
                   df.groupby('id')['count'].apply(list).apply(pd.Series)], axis =1)

output_df.columns = ['sub_id1', 'sub_id2', 'count_sub_id1', 'count_sub_id2']

>>>output_df

        sub_id1 sub_id2 count_sub_id1   count_sub_id2
0       94.0    NaN     1.0             NaN
1       94.0    315.0   9.0             7.0
2       94.0    265.0   4.0            1.0

Answer 3

Here's another way: 这是另一种方式：

df_out = (df.groupby('id')
   .apply(lambda x: x.reset_index(drop=True).head(2))
   .drop('id', axis=1)
   .unstack()
)

Output: 输出：

   sub_id        count     
        0      1     0    1
id                         
0    94.0    NaN   1.0  NaN
1    94.0  315.0   9.0  7.0
2    94.0  265.0   4.0  1.0

To rename: 重命名：

df_out.columns = [f'{i}{j+1} for i,j in df_out.columns]

使用分组依据的数据透视熊猫数据框

问题描述

3 个解决方案

解决方案1
3 已采纳 2019-09-16 16:01:35

解决方案2
1 2019-09-16 16:10:19

解决方案3
1 2019-09-16 16:12:24

使用分组依据的数据透视熊猫数据框

问题描述

3 个解决方案

解决方案1 3 已采纳 2019-09-16 16:01:35

解决方案2 1 2019-09-16 16:10:19

解决方案3 1 2019-09-16 16:12:24

解决方案1
3 已采纳 2019-09-16 16:01:35

解决方案2
1 2019-09-16 16:10:19

解决方案3
1 2019-09-16 16:12:24