简体   繁体   中英

Group BY based on one column and get unique and sum of other columns pandas

I have dataframe like this:

id   product   department   price
1      x           a          5
2      y           b         10
1      z           b         15
3      z           a         2
2      x           a         1
1      x           a         1
4      w           b         10

Now I want to groupby using id and get all unique value of product and department in list associated with it and sum of price.

Expected Output:

id   product   department   price
1    [x, z]      [a, b]      21
2    [x, y]      [a, b]      11
3    [z]         [a]         2
4    [w]         [b]         10

Now I can do groupby and get one column from 3, but I'm not able to figure out how to get all three.

df.groupby(['id'])[product].unique()

Simple case of using agg() with a dict definition

import io

df = pd.read_csv(io.StringIO("""id   product   department   price
1      x           a          5
2      y           b         10
1      z           b         15
3      z           a         2
2      x           a         1
1      x           a         1
4      w           b         10"""), sep="\s+")

df.groupby("id").agg({"price":"sum","product":lambda s: s.unique().tolist(), "department":lambda s: s.unique().tolist()})

id price product department
1 21 ['x', 'z'] ['a', 'b']
2 11 ['y', 'x'] ['b', 'a']
3 2 ['z'] ['a']
4 10 ['w'] ['b']

Groupby on id , apply the required aggregates on the columns. For unique values, one way to do is list(set(<sequence>)) if order is not needed to be preserved. If you need the order, then you can use x.unique().tolist() instead of list(set(x))

out = (df.groupby('id')
      .agg({'product': lambda x: list(set(x)),
            'department': lambda x: list(set(x)),
            'price': sum
            })
       )

OUTPUT:

   product department  price
id                          
1   [z, x]     [a, b]     21
2   [x, y]     [a, b]     11
3      [z]        [a]      2
4      [w]        [b]     10

To get sorted list of unique values of product and department (as shown on your expected result), you can use np.unique() together with GroupBy.agg() , as follows:

import numpy as np

df.groupby('id', as_index=False).agg(
    {'product': lambda x: np.unique(x).tolist(), 
     'department': lambda x: np.unique(x).tolist(), 
     'price': 'sum'})

Result:

   id product department  price
0   1  [x, z]     [a, b]     21
1   2  [x, y]     [a, b]     11
2   3     [z]        [a]      2
3   4     [w]        [b]     10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM