[英]pandas merge rows based on grouping
Let's say I have a dataframe that looks like this:假设我有一个如下所示的数据框:
col1 col2 col3
a 1 a
a 98 xx
a 99 xy
b 1 a
b 2 b
b 3 c
b 8 xx
b 9 xy
I need to merge rows where in col3
= xx
and xy
, which are grouped by col1
, hence the resulting dataframe looks like:我需要合并
col3
= xx
和xy
,这些行按col1
分组,因此生成的数据帧如下所示:
col1 col2 col3
a 1 a
a 98 xz
b 1 a
b 2 b
b 3 c
b 8 xz
Is there a simple way of doing this pandas
?有没有一种简单的方法来做这个
pandas
?
IIUC国际大学联盟
df.groupby([df.col1,df.col3.replace({'xx':'xz','xy':'xz'})]).col2.first().reset_index()
Out[29]:
col1 col3 col2
0 a a 1
1 a xz 98
2 b a 1
3 b b 2
4 b c 3
5 b xz 8
Here's my approach with drop_duplicates
:这是我使用
drop_duplicates
的方法:
# xx and xy
s = df.col3.isin(['xx','xy']);
(df.assign(col3=lambda x: np.where(s, 'xz', x['col3']), # replace xx and xy with xz
mask=s, # where xx and xy
block=(~s).cumsum()) # block of xx and xy
.drop_duplicates(['col1','mask','block'])
.drop(['mask','block'], axis=1)
)
Output:输出:
col1 col2 col3
0 a 1 a
1 a 98 xz
3 b 1 a
4 b 2 b
5 b 3 c
6 b 8 xz
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.