Let's say I have a dataframe
Category Data1 column1
A 'SOMEDATA' 10
A 'SOMEDATA' 2
A 'SOMEDATA' -10
B 'SOMEDATA' 10
B 'SOMEDATA' 1
B 'SOMEDATA' -10
and so on
I'd like to select a one row in each group by column value. For example, ABS(column1)
So resulting data is
Category Data1 column1
A 'SOMEDATA' 2
B 'SOMEDATA' 1
How can I do this in python?
I couldn't figure out how to return entire row. For example,
df.groupby('Category')['column1'].min();
this would only return 'Category' min(column1) only.
Here is a solution that is more computationally efficient.
TL;DR version
df.loc[df.groupby('Category')['column1'].idxmin()]
sort
then .drop_duplicates
, if you want single minimum row based on absolute value.
(df.assign(to_sort = df.column1.abs()).sort_values('to_sort')
.drop_duplicates('Category').drop(columns='to_sort'))
Category Data1 column1
4 B 'SOMEDATA' 1
1 A 'SOMEDATA' 2
Sort can only sort on existing columns, so we need to create the column of absolute values (with .assign
). Sorting then ensures the minumum absolute value appears first, and dropping duplicates keeps the first row for each category, which is now the minumum absolute value row.
Also possible with groupby
, which is better if you need to return more than one row per group:
df.assign(to_sort = df.column1.abs()).sort_values('to_sort').groupby(df.Category).head(1)
Alternatively, you can slice with the result of a groupby
. This is useful in cases where you want to return all rows that match the minimum:
df[df.groupby(df.Category, group_keys=False).apply(lambda x: x.column1 == x.column1.abs().min())]
Category Data1 column1
1 A 'SOMEDATA' 2
4 B 'SOMEDATA' 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.