[英]Dataframe - convert rows to columns - grouped by another columns
I am looking to convert a data frame as below我正在寻找如下转换数据框
Original dataset原始数据集
Group![]() |
Miles![]() |
---|---|
A![]() |
23 ![]() |
A![]() |
20 ![]() |
A![]() |
24 ![]() |
A![]() |
25 ![]() |
B![]() |
12 ![]() |
B![]() |
17 ![]() |
B![]() |
16 ![]() |
B![]() |
19 ![]() |
I want to convert from above format to this:我想从上述格式转换为:
Col_A![]() |
Col_B ![]() |
---|---|
23 ![]() |
12 ![]() |
20 ![]() |
17 ![]() |
24 ![]() |
16 ![]() |
25 ![]() |
19 ![]() |
pivot
:pivot
尝试:df = df.assign(t= df.groupby('Group').cumcount()).pivot(index = 't', columns ='Group', values = 'Miles').add_prefix('Col_').rename_axis(columns = None).reset_index(drop = True)
pd.concat
:pd.concat
:k = pd.concat([g.reset_index(drop=True)['Miles'] for _,g in df.groupby('Group')], 1)
k.columns = ['colA', 'colB']
set_index
/ unstack
:set_index
/ unstack
的另一种选择:k = df.set_index(['Group', df.groupby('Group').cumcount()]).unstack(0).add_prefix('Col_').rename_axis(columns= [None,None])
k.columns = k.columns.droplevel()
groupby
/ explode
:groupby
/ explode
:k = df.groupby('Group').agg(list).T.apply(pd.Series.explode).add_prefix('Col_')
k = k.reset_index(drop=True).rename_axis(columns = None)
Col_A Col_B
0 23 12
1 20 17
2 24 16
3 25 19
A pivot_table
option:一个
pivot_table
选项:
df = (
df.pivot_table(index=df.groupby('Group').cumcount(),
columns='Group',
values='Miles')
.add_prefix('Col_')
.rename_axis(columns=None)
)
df
: df
:
Col_A Col_B
0 23 12
1 20 17
2 24 16
3 25 19
Explaination:解释:
Create a new index based on the relative position in each group with groupby cumcount
:使用
groupby cumcount
根据每个组中的相对 position 创建一个新索引:
df.groupby('Group').cumcount()
Group new_index
A 0
A 1
A 2
A 3
B 0
B 1
B 2
B 3
Then Group
can become the new columns in the wide format Frame.然后
Group
可以成为宽格式 Frame 中的新列。
df.pivot_table(index=df.groupby('Group').cumcount(),
columns='Group',
values='Miles')
Group A B
0 23 12
1 20 17
2 24 16
3 25 19
Then some cleanup with add_prefix
+ rename_axis
:然后使用
add_prefix
+ rename_axis
进行一些清理:
df.pivot_table(index=df.groupby('Group').cumcount(),
columns='Group',
values='Miles')
.add_prefix('Col_')
.rename_axis(columns=None)
Col_A Col_B
0 23 12
1 20 17
2 24 16
3 25 19
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.