简体   繁体   中英

How to filter a groupby dataframe based on their values count

I have a groupby dataframe and I would like to return the top 3 groups with the highest value count.

for eg the below dataframe expected output table should be group 20,30 and 33

I wanted to display a raw dataset table, but the group function was not properly displayed on SO, That was I uploaded an image.

                     amount         cosine_group
cosine_group            

0                   952.5              0
4                   3000.0             4    
20                  2000.0            20
                    2000.0            20
                    2000.0            20
27                  2000.0            27    

30                  2100.0            30
                    2100.0            30
                    2100.0            30
33                  1065.0            33
                    1065.0            33
                    1065.0            33
                    1065.0            33

Expected Output:

                     amount         cosine_group
cosine_group            

20                  2000.0            20
                    2000.0            20
                    2000.0            20

30                  2100.0            30
                    2100.0            30
                    2100.0            30
33                  1065.0            33
                    1065.0            33
                    1065.0            33
                    1065.0            33

在此处输入图像描述

You can use .nlargest(3) to select the 3 largest size. Use .isin() to match for those rows with these values. Finally, use .loc to return the rows in original dataframe of the largest elements, as follows:

df = df.rename_axis(index='cosine_group0')   # to rename index axis name
df.loc[df['cosine_group'].isin(df.groupby('cosine_group', as_index=False)['cosine_group'].size().nlargest(3, 'size')['cosine_group'].tolist())]

Or use:

df = df.rename_axis(index='cosine_group0')   # to rename index axis
df.loc[df["cosine_group"].isin(df["cosine_group"].value_counts().nlargest(3).index)]

This may not be very pythonic but definitely gets the work done.

# retieve the index of the value counts
cosine_group_value = df["cosine_group"].value_counts().index

# get the fist 3 values  from the value counts (highest 3 values)
top3 = list(cosine_group_value)[:3]

# filter your dataframe using the top 3 values on the cosine_group column
df = df[df["cosine_group"].isin(top3)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM