简体   繁体   English

pandas groupby,然后按列的值选择一行(例如,最小值、最大值)

[英]pandas groupby and then select a row by value of column (min,max, for example)

Let's say I have a dataframe假设我有一个数据框

Category Data1 column1
A 'SOMEDATA' 10
A 'SOMEDATA' 2
A 'SOMEDATA' -10
B 'SOMEDATA' 10
B 'SOMEDATA' 1
B 'SOMEDATA' -10

and so on等等

I'd like to select a one row in each group by column value.我想按列值在每个组中选择一行。 For example, ABS(column1)例如,ABS(column1)

So resulting data is所以结果数据是

Category Data1 column1
A 'SOMEDATA' 2
B 'SOMEDATA'  1

How can I do this in python?我怎么能在python中做到这一点?

I couldn't figure out how to return entire row.我不知道如何返回整行。 For example,例如,

df.groupby('Category')['column1'].min();

this would only return 'Category' min(column1) only.这只会返回 'Category' min(column1) 。

Here is a solution that is more computationally efficient.这是一个计算效率更高的解决方案。

TL;DR version TL;DR 版本

df.loc[df.groupby('Category')['column1'].idxmin()]

sort then .drop_duplicates , if you want single minimum row based on absolute value. sort然后.drop_duplicates ,如果你想要基于绝对值的单个最小行。

(df.assign(to_sort = df.column1.abs()).sort_values('to_sort')
     .drop_duplicates('Category').drop(columns='to_sort'))

  Category       Data1  column1
4        B  'SOMEDATA'        1
1        A  'SOMEDATA'        2

Sort can only sort on existing columns, so we need to create the column of absolute values (with .assign ). Sort 只能对现有列进行排序,因此我们需要创建绝对值列(使用.assign )。 Sorting then ensures the minumum absolute value appears first, and dropping duplicates keeps the first row for each category, which is now the minumum absolute value row.然后排序确保最小绝对值首先出现,删除重复项保留每个类别的第一行,现在是最小绝对值行。

Also possible with groupby , which is better if you need to return more than one row per group:也可以使用groupby ,如果您需要为每组返回多于一行,则更好:

df.assign(to_sort = df.column1.abs()).sort_values('to_sort').groupby(df.Category).head(1)

Alternatively, you can slice with the result of a groupby .或者,您可以使用groupby的结果切片。 This is useful in cases where you want to return all rows that match the minimum:这在您想要返回与最小值匹配的所有行的情况下很有用:

df[df.groupby(df.Category, group_keys=False).apply(lambda x: x.column1 == x.column1.abs().min())]

  Category       Data1  column1
1        A  'SOMEDATA'        2
4        B  'SOMEDATA'        1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM