简体   繁体   中英

Pandas: conditional, groupby, cumsum

I have dataframe where i'm trying to create a new column showing the value with conditional groupby.

conditions:

  1. tag == 1, profit - cost
  2. tag > 1, -(cost)
  3. net is summed after every iteration

original df:

╔══════╦═════╦══════╦════════╗
║ rep  ║ tag ║ cost ║ profit ║
╠══════╬═════╬══════╬════════╣
║ john ║   1 ║   1  ║      5 ║
║ pete ║   2 ║   1  ║      3 ║
║ pete ║   3 ║   1  ║      4 ║
║ pete ║   4 ║   1  ║      5 ║
║ john ║   5 ║   1  ║      7 ║
║ john ║   1 ║   1  ║      9 ║
║ pete ║   1 ║   1  ║      3 ║
║ john ║   3 ║   1  ║      5 ║
╚══════╩═════╩══════╩════════╝

output hope to get:

╔══════╦═════╦══════╦════════╦═════╗
║ rep  ║ tag ║ cost ║ profit ║ net ║
╠══════╬═════╬══════╬════════╬═════╣
║ john ║   1 ║   1  ║      5 ║   4 ║
║ pete ║   2 ║   1  ║      3 ║  -1 ║
║ pete ║   3 ║   1  ║      4 ║  -2 ║
║ pete ║   4 ║   1  ║      5 ║  -3 ║
║ john ║   5 ║   1  ║      7 ║   3 ║
║ john ║   1 ║   1  ║      9 ║  11 ║
║ pete ║   1 ║   1  ║      3 ║  -1 ║
║ john ║   3 ║   1  ║      5 ║  15 ║
╚══════╩═════╩══════╩════════╩═════╝

I'm able to use loc to derives 'if' conditions, however, i'm not able to figure out or find examples of groupby/if/cumsum for this.

sample df code:

data = {
    "rep": ['john','pete','pete','pete','john','john','pete','john'],
    "tag": [1,2,3,4,5,1,1,3],
    "cost": [1,1,1,1,1,1,1,1],
    "profit": [5,3,4,5,7,9,3,5]}

df = pd.DataFrame(data)

kindly advise

cond1 = df['tag'].eq(1)
s = df['cost'].mul(-1).mask(cond1, df['profit'] - df['cost'])

s

0    4
1   -1
2   -1
3   -1
4   -1
5    8
6    2
7   -1
Name: cost, dtype: int64

df['net'] = s.groupby(df['rep']).cumsum()

df

    rep    tag  cost    profit  net
0   john    1   1       5       4
1   pete    2   1       3       -1
2   pete    3   1       4       -2
3   pete    4   1       5       -3
4   john    5   1       7       3
5   john    1   1       9       11
6   pete    1   1       3       -1
7   john    3   1       5       10

Other Way

if you want 1-line code use following

out = (df.assign(net=df['cost'].mul(-1)
                 .mask(df['tag'].eq(1), df['profit'].sub(df['cost']))
                 .groupby(df['rep']).cumsum()))

same result

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM