简体   繁体   中英

pandas groupby and then select a row by value of column (min,max, for example)

Let's say I have a dataframe

Category Data1 column1
A 'SOMEDATA' 10
A 'SOMEDATA' 2
A 'SOMEDATA' -10
B 'SOMEDATA' 10
B 'SOMEDATA' 1
B 'SOMEDATA' -10

and so on

I'd like to select a one row in each group by column value. For example, ABS(column1)

So resulting data is

Category Data1 column1
A 'SOMEDATA' 2
B 'SOMEDATA'  1

How can I do this in python?

I couldn't figure out how to return entire row. For example,

df.groupby('Category')['column1'].min();

this would only return 'Category' min(column1) only.

Here is a solution that is more computationally efficient.

TL;DR version

df.loc[df.groupby('Category')['column1'].idxmin()]

sort then .drop_duplicates , if you want single minimum row based on absolute value.

(df.assign(to_sort = df.column1.abs()).sort_values('to_sort')
     .drop_duplicates('Category').drop(columns='to_sort'))

  Category       Data1  column1
4        B  'SOMEDATA'        1
1        A  'SOMEDATA'        2

Sort can only sort on existing columns, so we need to create the column of absolute values (with .assign ). Sorting then ensures the minumum absolute value appears first, and dropping duplicates keeps the first row for each category, which is now the minumum absolute value row.

Also possible with groupby , which is better if you need to return more than one row per group:

df.assign(to_sort = df.column1.abs()).sort_values('to_sort').groupby(df.Category).head(1)

Alternatively, you can slice with the result of a groupby . This is useful in cases where you want to return all rows that match the minimum:

df[df.groupby(df.Category, group_keys=False).apply(lambda x: x.column1 == x.column1.abs().min())]

  Category       Data1  column1
1        A  'SOMEDATA'        2
4        B  'SOMEDATA'        1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM