简体   繁体   中英

how to groupby in complicated condition in pandas

I have dataframe like this

     A    B    C
0    1    7    a
1    2    8    b
2    3    9    c
3    4   10    a
4    5   11    b
5    6   12    c

I would like to get groupby result (key=column C) below;

    A    B
d  12   36

"d" means a or b ,

so I would like to groupby only with "a" and "b".

and then put together as "d".

when I sum up with all the key elements then drop, it consume much time....

One option is to use pandas where to transform the C column so that where it was a or b becomes d and then you can groupby the transformed column and do the normal summary on it, and if rows with c is not desired, you can simply drop it after the summary:

df_sum = df.groupby(df.C.where(~df.C.isin(['a', 'b']), "d")).sum().reset_index()

df_sum
#   C   A   B
#0  c   9   21
#1  d   12  36

df_sum.loc[df_sum.C == "d"]

#   C   A   B
#1  d   12  36

To see more clearly how the where clause works:

df.C.where(~df.C.isin(['a','b']), 'd')

# 0    d
# 1    d
# 2    c
# 3    d
# 4    d
# 5    c
# Name: C, dtype: object

It acts like a replace method and replace a and b with d which will be grouped together when passed to groupby function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM