简体   繁体   中英

groupby two columns and count unique values from a third column

I have the following df1:

id period color size rate
1    01    red   12   30
1    02    red   12   30
2    01    blue  12   35
3    03    blue  12   35
4    01    blue  12   35
4    02    blue  12   35
5    01    pink  10   40
6    01    pink  10   40

I need to create a new df2 with an index that is an aggregate of 3 columns color-size-rate, then groupby 'period' and get the count of unique ids. My final df should be have the following structure:

index       period   count
red-12-30    01        1
red-12-30    02        1
blue-12-35   01        2
blue-12-35   03        1
blue-12-35   02        1
pink-10-40   01        2

Thank you in advance for your help.

try .agg('-'.join) and .groupby

df1 =  df.groupby([df[["color", "size", "rate"]].astype(str)\
            .agg("-".join, 1).rename('index'), "period"])\
                .agg(count=("id", "nunique"))\
                .reset_index()
               
print(df1)

        index  period  count
0  blue-12-35       1      2
1  blue-12-35       2      1
2  blue-12-35       3      1
3  pink-10-40       1      2
4   red-12-30       1      1
5   red-12-30       2      1

you can achieve this with a groupby

 df2 = df1.groupby(['color', 'size', 'rate', 'period']).count().reset_index();
 df2['index'] = df2.apply(lambda x: '-'.join([x['color'], x['size'], x['rate']]), axis = 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM