简体   繁体   中英

Sum dataframe column only if condition match and group by

Consider I have the following:

Dataframe:

id    createdId   updatedId   ownerId   value
1     50          50          10        105 
2     51          50          10        240
3     52          50          10        420
4     53          53          10        470
5     40          40          11        320
6     41          40          11        18
7     55          55          12        50
8     57          55          12        412
9     59          55          12        398

I am trying to sum the column 'value' in a new column 'output' ONLY if ownerId is the same AND if updatedId is less or equal to createdId

In my example, the output should be the below dataframe:

id    createdId   updatedId   ownerId   value    output
1     50          50          10        105      105
2     51          50          10        240      345  # Add to the previous
3     52          50          10        420      765  # Add to the previous
4     53          53          10        470      1235 # Add to the previous
5     40          40          11        320      320  # Reset because Owner is different
6     41          40          11        18       338
7     55          55          12        50       50
8     57          55          12        412      462
9     59          55          12        398      860

I tried to do:

df['output'] = df[['value']].sum(axis=1).where(df['createdId'] > df['updatedId'], 0)

But this does not include the owner check and it seems not to be summing anything...

I am new with Panda, could you please show me how you would do this?


EDIT 1:

I am trying to sum all the column 'value' in a new column 'output' from the range [updatedId, createdId] and only when OwnerId is the same.

Output:

id    createdId   updatedId   ownerId   value    output
1     50          50          10        105      105
2     51          50          10        240      345  # Add to the previous
3     52          50          10        420      765  # Add to the previous
4     53          53          10        470      470  # Reset because no other value between 53 and 53
5     40          40          11        320      320  # Reset because Owner is different
6     41          40          11        18       338
7     55          55          12        50       50
8     57          55          12        412      462
9     59          55          12        398      860

Use GroupBy.cumsum , only first set 0 by condition:

s = df['value'].where(df['createdId'] >= df['updatedId'], 0)

df['output'] = s.groupby(df['ownerId']).cumsum()
print (df)
   id  createdId  updatedId  ownerId  value  output
0   1         50         50       10    105     105
1   2         51         50       10    240     345
2   3         52         50       10    420     765
3   4         53         53       10    470    1235
4   5         40         40       11    320     320
5   6         41         40       11     18     338
6   7         55         55       12     50      50
7   8         57         55       12    412     462
8   9         59         55       12    398     860

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM