简体   繁体   English

熊猫基于分组合并行

[英]pandas merge rows based on grouping

Let's say I have a dataframe that looks like this:假设我有一个如下所示的数据框:

col1    col2    col3
a       1       a
a       98      xx
a       99      xy
b       1       a
b       2       b
b       3       c
b       8       xx
b       9       xy

I need to merge rows where in col3 = xx and xy , which are grouped by col1 , hence the resulting dataframe looks like:我需要合并col3 = xxxy ,这些行按col1分组,因此生成的数据帧如下所示:

col1    col2    col3
a       1       a
a       98      xz
b       1       a
b       2       b
b       3       c
b       8       xz

Is there a simple way of doing this pandas ?有没有一种简单的方法来做这个pandas

IIUC国际大学联盟

df.groupby([df.col1,df.col3.replace({'xx':'xz','xy':'xz'})]).col2.first().reset_index()
Out[29]: 
  col1 col3  col2
0    a    a     1
1    a   xz    98
2    b    a     1
3    b    b     2
4    b    c     3
5    b   xz     8

Here's my approach with drop_duplicates :这是我使用drop_duplicates的方法:

# xx and xy
s = df.col3.isin(['xx','xy']);

(df.assign(col3=lambda x: np.where(s, 'xz', x['col3']), # replace xx and xy with xz
           mask=s,                                      # where xx and xy
           block=(~s).cumsum())                         # block of xx and xy
   .drop_duplicates(['col1','mask','block'])
   .drop(['mask','block'], axis=1)
)

Output:输出:

  col1  col2 col3
0    a     1    a
1    a    98   xz
3    b     1    a
4    b     2    b
5    b     3    c
6    b     8   xz

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM