[英]Pivot pandas dataframe using group by
I have a dataframe which is like this: 我有一个像这样的数据框:
id sub_id count
0 94 1
1 94 9
1 315 7
2 94 4
2 265 1
data = {'id': [0,1,1,2,2],
'sub_id': [94,94,315,94,265],
'count': [1,9,7,4,1]
}
df = pd.DataFrame(data)
And I want it in the following form:
id sub_id1 count_sub_id1 sub_id2 count_sub_id2
0 94 1 NaN NaN
1 94 9 315 7
2 94 4 265 1
Note: Here, every id
can have either can have a maximum of two rows, each with different sub_id
and their counts. 注意:在这里,每个id
最多可以包含两行,每行具有不同的sub_id
及其计数。
I tried this df.pivot(index='id',columns='sub_id',values='count')
but this is causing all rows in the second column to be expanded as different columns, whereas I only need two columns, with a custom name, ie. 我尝试了这个df.pivot(index='id',columns='sub_id',values='count')
但这导致第二列中的所有行都被扩展为不同的列,而我只需要两列自定义名称,即 only those two rows which exist for each group of id
s 每组id
s仅存在两行
Try using: 尝试使用:
df_out = (df.set_index(['id', df.groupby('id').cumcount()+1])
.unstack().sort_index(level=1, axis=1))
df_out.columns = [f'{i}{j}' if i == "sub_id" else f'{i}_sub_id{j}'
for i, j in df_out.columns]
print(df_out.reset_index())
Output: 输出:
id count_sub_id1 sub_id1 count_sub_id2 sub_id2
0 0 1.0 94.0 NaN NaN
1 1 9.0 94.0 7.0 315.0
2 2 4.0 94.0 1.0 265.0
output_df = pd.concat([df.groupby('id')['sub_id'].apply(list).apply(pd.Series),
df.groupby('id')['count'].apply(list).apply(pd.Series)], axis =1)
output_df.columns = ['sub_id1', 'sub_id2', 'count_sub_id1', 'count_sub_id2']
>>>output_df
sub_id1 sub_id2 count_sub_id1 count_sub_id2
0 94.0 NaN 1.0 NaN
1 94.0 315.0 9.0 7.0
2 94.0 265.0 4.0 1.0
Here's another way: 这是另一种方式:
df_out = (df.groupby('id')
.apply(lambda x: x.reset_index(drop=True).head(2))
.drop('id', axis=1)
.unstack()
)
Output: 输出:
sub_id count
0 1 0 1
id
0 94.0 NaN 1.0 NaN
1 94.0 315.0 9.0 7.0
2 94.0 265.0 4.0 1.0
To rename: 重命名:
df_out.columns = [f'{i}{j+1} for i,j in df_out.columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.