From the table below, I would like to create two columns that aggregate 'amount' depending on the value of 'number' and 'type'.
number | type | amount |
---|---|---|
1 | A | 10 |
1 | A | 20 |
2 | A | 10 |
3 | B | 20 |
2 | B | 10 |
1 | B | 20 |
Here's the table I would like to get. The first column I want to create is 'amount A', which is the aggregation of the rows with 'A' in 'type' grouped by 'number'. The other one 'amount A+B' is the aggregation of all the rows grouped by 'number' regardless the value of 'type'.
number | amount A | amount A+B |
---|---|---|
1 | 30 | 50 |
2 | 10 | 20 |
3 | 0 | 20 |
I only came up with the way to create subsets and create two columns separately. But I wonder if there is more efficient way.
You can try this:
out = (
df.astype({'number': 'category'})
.query('type == "A"')
.groupby(['number'])['amount'].sum()
.to_frame('amount A')
)
out['amount A+B'] = df.groupby('number')['amount'].sum()
print(out)
amount A amount A+B
number
1 30 50
2 10 20
3 0 20
One of the tricks is to convert the 'number'
column to a categorical so that we have a resultant sum
for all numbers even if a number doesn't appear with 'type A'
.
Once we do that, we can very easily perform a groupby across the numbers with an without the rows where type == "A"
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.