[英]Create two columns from the same columns but in different ways
From the table below, I would like to create two columns that aggregate 'amount' depending on the value of 'number' and 'type'.从下表中,我想创建两列,根据“数字”和“类型”的值聚合“金额”。
number![]() |
type![]() |
amount![]() |
---|---|---|
1 ![]() |
A![]() |
10 ![]() |
1 ![]() |
A![]() |
20 ![]() |
2 ![]() |
A![]() |
10 ![]() |
3 ![]() |
B![]() |
20 ![]() |
2 ![]() |
B![]() |
10 ![]() |
1 ![]() |
B![]() |
20 ![]() |
Here's the table I would like to get.这是我想要的表。 The first column I want to create is 'amount A', which is the aggregation of the rows with 'A' in 'type' grouped by 'number'.
我要创建的第一列是“数量 A”,它是按“数字”分组的“类型”中带有“A”的行的聚合。 The other one 'amount A+B' is the aggregation of all the rows grouped by 'number' regardless the value of 'type'.
另一个“数量 A+B”是按“数字”分组的所有行的聚合,而不管“类型”的值如何。
number![]() |
amount A![]() |
amount A+B![]() |
---|---|---|
1 ![]() |
30 ![]() |
50 ![]() |
2 ![]() |
10 ![]() |
20 ![]() |
3 ![]() |
0 ![]() |
20 ![]() |
I only came up with the way to create subsets and create two columns separately.我只是想出了创建子集和分别创建两列的方法。 But I wonder if there is more efficient way.
但我想知道是否有更有效的方法。
You can try this:你可以试试这个:
out = (
df.astype({'number': 'category'})
.query('type == "A"')
.groupby(['number'])['amount'].sum()
.to_frame('amount A')
)
out['amount A+B'] = df.groupby('number')['amount'].sum()
print(out)
amount A amount A+B
number
1 30 50
2 10 20
3 0 20
One of the tricks is to convert the 'number'
column to a categorical so that we have a resultant sum
for all numbers even if a number doesn't appear with 'type A'
.其中一个技巧是将
'number'
列转换为分类列,这样即使数字没有出现在'type A'
中,我们也可以得到所有数字的sum
。
Once we do that, we can very easily perform a groupby across the numbers with an without the rows where type == "A"
.一旦我们这样做了,我们就可以很容易地在没有行的数字上执行 groupby where
type == "A"
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.