Create two columns from the same columns but in different ways

Question

From the table below, I would like to create two columns that aggregate 'amount' depending on the value of 'number' and 'type'.

number	type	amount
1	A	10
1	A	20
2	A	10
3	B	20
2	B	10
1	B	20

Here's the table I would like to get. The first column I want to create is 'amount A', which is the aggregation of the rows with 'A' in 'type' grouped by 'number'. The other one 'amount A+B' is the aggregation of all the rows grouped by 'number' regardless the value of 'type'.

number	amount A	amount A+B
1	30	50
2	10	20
3	0	20

I only came up with the way to create subsets and create two columns separately. But I wonder if there is more efficient way.

Answer 1

You can try this:

out = (
    df.astype({'number': 'category'})
    .query('type == "A"')
    .groupby(['number'])['amount'].sum()
    .to_frame('amount A')
)

out['amount A+B'] = df.groupby('number')['amount'].sum()

print(out)
        amount A  amount A+B
number                      
1             30          50
2             10          20
3              0          20

One of the tricks is to convert the 'number' column to a categorical so that we have a resultant sum for all numbers even if a number doesn't appear with 'type A' .

Once we do that, we can very easily perform a groupby across the numbers with an without the rows where type == "A" .

Create two columns from the same columns but in different ways

Question

1 answers

solution1
1 ACCPTED 2022-11-22 14:42:32

Create two columns from the same columns but in different ways

Question

1 answers

solution1 1 ACCPTED 2022-11-22 14:42:32

solution1
1 ACCPTED 2022-11-22 14:42:32