[英]Pivot - Transpose columns by duplicates pandas dataframe
I have a DataFrame with a column named 'ID' that has duplicate observations.我有一个名为“ID”的列的 DataFrame,该列具有重复的观察结果。 Each 'ID' row has one or more 'Article' values columns.每个“ID”行都有一个或多个“文章”值列。 I want to transpose the whole dataframe grouping by 'ID' adding new columns at the same row of a unique 'ID'.我想通过“ID”转置整个数据框分组,在唯一“ID”的同一行添加新列。
What I have:我拥有的:
ID Article_1 Article_2
1 Banana Coconut
2 Apple Strawberry
1 Apple
3 Tomatoe
1 Pineapple
2 Banana
4 Apple
5 Apple Strawberry
3 Apple
What I want:我想要的是:
ID Article_1 Article_2 Article_3 Article_4
0001 Banana Coconut Apple Pineapple
0002 Apple Strawberry Banana NaN
0003 Tomatoe Apple NaN NaN
0004 Apple NaN NaN NaN
0005 Apple Strawberry NaN NaN
NEW EDIT:新编辑:
I had some situations where order is important.我遇到过一些顺序很重要的情况。
My DF:我的DF:
ID Article Article_2
1 Banana NaN
2 Apple NaN
1 Apple Coconut
3 Tomatoe Coconut
1 Pineapple Tropical
2 Banana Coconut
4 Apple Coconut
5 Apple Coconut
3 Apple Pineapple
Output with first @Erfan solution:第一个@Erfan 解决方案的输出:
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
0001 Banana Apple Pineapple NaN Coconut Tropical
0002 Apple Banana NaN Coconut NaN NaN
0003 Tomatoe Apple Coconut Pineapple NaN NaN
0004 Apple Coconut NaN NaN NaN NaN
0005 Apple Coconut NaN NaN NaN NaN
What i need:我需要的:
Article_1 Article_2 Article_3 Article_4 Article_5 Article_6
0001 Banana Apple Pineapple Coconut Tropical NaN
0002 Apple Banana Coconut NaN NaN NaN
0003 Tomatoe Apple Coconut Pineapple NaN NaN
0004 Apple Coconut NaN NaN NaN NaN
0005 Apple Coconut NaN NaN NaN NaN
I can't have Article_5 with a NaN value and Article_6 with a value at the same row.我不能在同一行中使用具有 NaN 值的 Article_5 和具有值的 Article_6。
If order of the articles is not important, we can use DataFrame.melt
to unpivot your articles to rows.如果文章的顺序不重要,我们可以使用DataFrame.melt
将您的文章转为行。
Then we use DataFrame.pivot_table
to aggregate to each ID
.然后我们使用DataFrame.pivot_table
聚合到每个ID
。 While we use GroupBy.cumcount
to give a unique identifier to each article
within a ID
:虽然我们使用GroupBy.cumcount
为ID
每篇article
提供唯一标识符:
dfn = df.melt(id_vars='ID', value_vars=['Article_1', 'Article_2'])
dfn = dfn.pivot_table(index='ID',
columns=dfn.groupby('ID')['value'].cumcount().add(1),
values='value',
aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')
Article_1 Article_2 Article_3 Article_4
0001 Banana Apple Pineapple Coconut
0002 Apple Banana Strawberry NaN
0003 Tomatoe Apple NaN NaN
0004 Apple NaN NaN NaN
0005 Apple Strawberry NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.