pandas groupby and then select a row by value of column (min,max, for example)

Question

Let's say I have a dataframe

Category Data1 column1
A 'SOMEDATA' 10
A 'SOMEDATA' 2
A 'SOMEDATA' -10
B 'SOMEDATA' 10
B 'SOMEDATA' 1
B 'SOMEDATA' -10

and so on

I'd like to select a one row in each group by column value. For example, ABS(column1)

So resulting data is

Category Data1 column1
A 'SOMEDATA' 2
B 'SOMEDATA'  1

How can I do this in python?

I couldn't figure out how to return entire row. For example,

df.groupby('Category')['column1'].min();

this would only return 'Category' min(column1) only.

Answer 1

Here is a solution that is more computationally efficient.

TL;DR version

df.loc[df.groupby('Category')['column1'].idxmin()]

Answer 2

sort then .drop_duplicates , if you want single minimum row based on absolute value.

(df.assign(to_sort = df.column1.abs()).sort_values('to_sort')
     .drop_duplicates('Category').drop(columns='to_sort'))

  Category       Data1  column1
4        B  'SOMEDATA'        1
1        A  'SOMEDATA'        2

Sort can only sort on existing columns, so we need to create the column of absolute values (with .assign ). Sorting then ensures the minumum absolute value appears first, and dropping duplicates keeps the first row for each category, which is now the minumum absolute value row.

Also possible with groupby , which is better if you need to return more than one row per group:

df.assign(to_sort = df.column1.abs()).sort_values('to_sort').groupby(df.Category).head(1)

Alternatively, you can slice with the result of a groupby . This is useful in cases where you want to return all rows that match the minimum:

df[df.groupby(df.Category, group_keys=False).apply(lambda x: x.column1 == x.column1.abs().min())]

  Category       Data1  column1
1        A  'SOMEDATA'        2
4        B  'SOMEDATA'        1

pandas groupby and then select a row by value of column (min,max, for example)

Question

2 answers

solution1
2 2019-11-20 14:57:16

solution2
1 2019-03-15 16:05:00

pandas groupby and then select a row by value of column (min,max, for example)

Question

2 answers

solution1 2 2019-11-20 14:57:16

solution2 1 2019-03-15 16:05:00

solution1
2 2019-11-20 14:57:16

solution2
1 2019-03-15 16:05:00