[英]Group BY based on one column and get unique and sum of other columns pandas
I have dataframe like this:我有这样的数据框:
id product department price
1 x a 5
2 y b 10
1 z b 15
3 z a 2
2 x a 1
1 x a 1
4 w b 10
Now I want to groupby using id
and get all unique value of product and department
in list associated with it and sum of price.现在我想使用
id
分组并获取与其关联的列表中product and department
所有唯一值以及价格总和。
Expected Output:预期输出:
id product department price
1 [x, z] [a, b] 21
2 [x, y] [a, b] 11
3 [z] [a] 2
4 [w] [b] 10
Now I can do groupby and get one column from 3, but I'm not able to figure out how to get all three.现在我可以进行 groupby 并从 3 中获取一列,但我无法弄清楚如何获取所有三列。
df.groupby(['id'])[product].unique()
Simple case of using agg()
with a dict definition使用
agg()
和dict定义的简单案例
import io
df = pd.read_csv(io.StringIO("""id product department price
1 x a 5
2 y b 10
1 z b 15
3 z a 2
2 x a 1
1 x a 1
4 w b 10"""), sep="\s+")
df.groupby("id").agg({"price":"sum","product":lambda s: s.unique().tolist(), "department":lambda s: s.unique().tolist()})
id ![]() |
price![]() |
product![]() |
department![]() |
---|---|---|---|
1 ![]() |
21 ![]() |
['x', 'z'] ![]() |
['a', 'b'] ![]() |
2 ![]() |
11 ![]() |
['y', 'x'] ![]() |
['b', 'a'] ![]() |
3 ![]() |
2 ![]() |
['z'] ![]() |
['a'] ![]() |
4 ![]() |
10 ![]() |
['w'] ![]() |
['b'] ![]() |
Groupby on id
, apply the required aggregates on the columns. Groupby on
id
,在列上应用所需的聚合。 For unique values, one way to do is list(set(<sequence>))
if order is not needed to be preserved.对于唯一值,如果不需要保留顺序,则一种方法是
list(set(<sequence>))
。 If you need the order, then you can use x.unique().tolist()
instead of list(set(x))
如果您需要订单,那么您可以使用
x.unique().tolist()
而不是list(set(x))
out = (df.groupby('id')
.agg({'product': lambda x: list(set(x)),
'department': lambda x: list(set(x)),
'price': sum
})
)
OUTPUT:输出:
product department price
id
1 [z, x] [a, b] 21
2 [x, y] [a, b] 11
3 [z] [a] 2
4 [w] [b] 10
To get sorted list of unique values of product
and department
(as shown on your expected result), you can use np.unique()
together with GroupBy.agg()
, as follows:要获得
product
和department
的唯一值的排序列表(如您的预期结果所示),您可以将np.unique()
与GroupBy.agg()
一起使用,如下所示:
import numpy as np
df.groupby('id', as_index=False).agg(
{'product': lambda x: np.unique(x).tolist(),
'department': lambda x: np.unique(x).tolist(),
'price': 'sum'})
Result:结果:
id product department price
0 1 [x, z] [a, b] 21
1 2 [x, y] [a, b] 11
2 3 [z] [a] 2
3 4 [w] [b] 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.