I have a groupby dataframe and I would like to return the top 3 groups with the highest value count.
for eg the below dataframe expected output table should be group
20,30 and 33
I wanted to display a raw dataset table, but the group function was not properly displayed on SO, That was I uploaded an image.
amount cosine_group
cosine_group
0 952.5 0
4 3000.0 4
20 2000.0 20
2000.0 20
2000.0 20
27 2000.0 27
30 2100.0 30
2100.0 30
2100.0 30
33 1065.0 33
1065.0 33
1065.0 33
1065.0 33
Expected Output:
amount cosine_group
cosine_group
20 2000.0 20
2000.0 20
2000.0 20
30 2100.0 30
2100.0 30
2100.0 30
33 1065.0 33
1065.0 33
1065.0 33
1065.0 33
You can use .nlargest(3)
to select the 3 largest size. Use .isin()
to match for those rows with these values. Finally, use .loc
to return the rows in original dataframe of the largest elements, as follows:
df = df.rename_axis(index='cosine_group0') # to rename index axis name
df.loc[df['cosine_group'].isin(df.groupby('cosine_group', as_index=False)['cosine_group'].size().nlargest(3, 'size')['cosine_group'].tolist())]
Or use:
df = df.rename_axis(index='cosine_group0') # to rename index axis
df.loc[df["cosine_group"].isin(df["cosine_group"].value_counts().nlargest(3).index)]
This may not be very pythonic but definitely gets the work done.
# retieve the index of the value counts
cosine_group_value = df["cosine_group"].value_counts().index
# get the fist 3 values from the value counts (highest 3 values)
top3 = list(cosine_group_value)[:3]
# filter your dataframe using the top 3 values on the cosine_group column
df = df[df["cosine_group"].isin(top3)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.