简体   繁体   English

基于一列分组并获得其他列熊猫的唯一性和总和

[英]Group BY based on one column and get unique and sum of other columns pandas

I have dataframe like this:我有这样的数据框:

id   product   department   price
1      x           a          5
2      y           b         10
1      z           b         15
3      z           a         2
2      x           a         1
1      x           a         1
4      w           b         10

Now I want to groupby using id and get all unique value of product and department in list associated with it and sum of price.现在我想使用id分组并获取与其关联的列表中product and department所有唯一值以及价格总和。

Expected Output:预期输出:

id   product   department   price
1    [x, z]      [a, b]      21
2    [x, y]      [a, b]      11
3    [z]         [a]         2
4    [w]         [b]         10

Now I can do groupby and get one column from 3, but I'm not able to figure out how to get all three.现在我可以进行 groupby 并从 3 中获取一列,但我无法弄清楚如何获取所有三列。

df.groupby(['id'])[product].unique()

Simple case of using agg() with a dict definition使用agg()dict定义的简单案例

import io

df = pd.read_csv(io.StringIO("""id   product   department   price
1      x           a          5
2      y           b         10
1      z           b         15
3      z           a         2
2      x           a         1
1      x           a         1
4      w           b         10"""), sep="\s+")

df.groupby("id").agg({"price":"sum","product":lambda s: s.unique().tolist(), "department":lambda s: s.unique().tolist()})

id ID price价钱 product产品 department
1 1 21 21 ['x', 'z'] ['x', 'z'] ['a', 'b'] ['a', 'b']
2 2 11 11 ['y', 'x'] ['y', 'x'] ['b', 'a'] ['b', 'a']
3 3 2 2 ['z'] ['z'] ['a'] ['一种']
4 4 10 10 ['w'] ['w'] ['b'] ['b']

Groupby on id , apply the required aggregates on the columns. Groupby on id ,在列上应用所需的聚合。 For unique values, one way to do is list(set(<sequence>)) if order is not needed to be preserved.对于唯一值,如果不需要保留顺序,则一种方法是list(set(<sequence>)) If you need the order, then you can use x.unique().tolist() instead of list(set(x))如果您需要订单,那么您可以使用x.unique().tolist()而不是list(set(x))

out = (df.groupby('id')
      .agg({'product': lambda x: list(set(x)),
            'department': lambda x: list(set(x)),
            'price': sum
            })
       )

OUTPUT:输出:

   product department  price
id                          
1   [z, x]     [a, b]     21
2   [x, y]     [a, b]     11
3      [z]        [a]      2
4      [w]        [b]     10

To get sorted list of unique values of product and department (as shown on your expected result), you can use np.unique() together with GroupBy.agg() , as follows:要获得productdepartment的唯一值的排序列表(如您的预期结果所示),您可以将np.unique()GroupBy.agg()一起使用,如下所示:

import numpy as np

df.groupby('id', as_index=False).agg(
    {'product': lambda x: np.unique(x).tolist(), 
     'department': lambda x: np.unique(x).tolist(), 
     'price': 'sum'})

Result:结果:

   id product department  price
0   1  [x, z]     [a, b]     21
1   2  [x, y]     [a, b]     11
2   3     [z]        [a]      2
3   4     [w]        [b]     10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据 pandas 中的其他列对一列的值求和? - How to sum values of one column based on other columns in pandas? Pandas:如何按一列分组并显示每组所有其他列的唯一值计数? - Pandas: How to group by one column and show count for unique values for all other columns per group? 按计数和总和分组,基于pandas数据框中的特定列以及其他列 - group by count and sum based on particular column in pandas dataframe in separate column along with other columns 根据列的总和添加新列,并在Pandas中按2个其他列分组 - Add new column based on sum of a column and grouped by 2 other columns in Pandas 熊猫按几列之和进行分组,并保留另一列 - Pandas group by with sum on few columns and retain the other column 根据 Pandas Dataframe 中其他列的条件获取两列的总和 - Get sum of two columns based on conditions of other columns in a Pandas Dataframe 根据熊猫中其他列的值添加具有唯一标识符的列 - Add column with unique identifiers based on values from other columns in pandas 根据 pandas 中其他列的交集加入列中的唯一值 - Join unique values in a column based on intersection of other columns in pandas 按月分组,按列对行求和,保留其他列 - Group by month, sum rows based in column, and keep the other columns Python Pandas:根据另外两列的值查找列的总和 - Python Pandas: Find Sum of Column Based on Value of Two other Columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM