[英]How to reframe the dataframe based on column and row values?
I have a dataframe as follows:我有一个 dataframe 如下:
data = {
'Title': ['001C', '001C', '004C', '001C', '004C', '004C', '007C', '010C'],
'Items': ['A', 'B', 'D', 'A', 'A', 'K', 'L', 'M']
}
df = pd.DataFrame(data)
df去向
Title Items
0 001C A
1 001C B
2 004C D
3 001C A
4 004C A
5 004C K
6 007C L
7 010C M
I want to get the Items
under each Title
without any redundancy.我想在没有任何冗余的情况下获得每个
Title
下的Items
。 The expected output is预期的 output 是
001C 004C 007C 010C
0 A D L M
1 B A
2 K
You can drop_duplicates
, assign
a helper column with increasing indexes per Item per group, and pivot
:您可以
drop_duplicates
, assign
一个辅助列,每组每个项目的索引增加,以及pivot
:
(df.drop_duplicates(subset=['Title', 'Items'])
.assign(index=df.groupby('Title').cumcount())
.pivot(index='index', columns='Title', values='Items')
.rename_axis(index=None, columns=None)
#.fillna('') # uncomment if you want empty strings in place of NaNs
)
output: output:
001C 004C 007C 010C
0 A D L M
1 B A NaN NaN
2 NaN K NaN NaN
You can also use .drop_duplicates()
+ .pivot()
.您还可以使用
.drop_duplicates()
+ .pivot()
。 Then, relocate the non-NaN values of each column to the top by .dropna()
, as follows:然后,通过
.dropna()
将每列的非 NaN 值重新定位到顶部,如下所示:
(df.drop_duplicates()
.pivot(columns='Title', values='Items')
.apply(lambda x: pd.Series(x.dropna().values))
.rename_axis(columns=None)
)
Result:结果:
001C 004C 007C 010C
0 A D L M
1 B A NaN NaN
2 NaN K NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.