简体   繁体   English

Pandas数据框选择具有指定列中最高值的整个行

[英]Pandas dataframe select entire rows with highest values from a specified column

I have a dataframe where I want to return the full row that contains the largest values out of a specified column. 我有一个数据框,我想返回包含指定列中最大值的完整行。 So let's say I create a dataframe like this: 所以我要说我创建一个这样的数据帧:

df = pd.DataFrame(np.random.randint(0,100,size=(25, 4)), columns=list('ABCD'))

Then I'd have a table like this (sorry I can't get a proper table to form, so I just made a short one up): 然后我会有一张这样的桌子(抱歉,我找不到合适的桌子,所以我只做了一个简短的桌子):

A    B    C    D
14   67   35   22
75   21   34   64

And let's say it goes on for 25 rows like that. 让我们说它继续这样的25行。 I want to take the top 5 largest values of column C and return those full rows. 我想获取C列的前5个最大值并返回那些完整的行。

If I do: 如果我做:

df['C'].nlargest()

it returns those 5 largest values, but I want it to return the full row. 它返回那5个最大的值,但我希望它返回整行。

I thought the below would work, but it gives me an error of "IndexError: indices are out-of-bounds": 我认为下面的方法可行,但它给出了一个错误“IndexError:indices are out-of-bounds”:

df[df['C'].nlargest()]

I know this will be an easy solution for many people here, but it's stumped me. 我知道这对很多人来说都是一个简单的解决方案,但这让我很难过。 Thanks for your help. 谢谢你的帮助。

you want to use columns parameter: 你想使用columns参数:

In [53]: df.nlargest(5, columns=['C'])
Out[53]:
     A   B   C   D
17  43  91  95  32
18  13  36  81  56
7   61  90  76  85
16  68  21  73  68
14   3  64  71  59

Approach #1 One approach - 方法#1一种方法 -

df.iloc[df.C.argsort()[::-1][:5]]

With simplified slicing, reduces to - 通过简化切片,减少到 -

df.iloc[df.C.argsort()[:-6:-1]]

Approach #2 For performance, if the order of those largest n rows is not important, we can also use np.argpartition - 方法#2对于性能,如果那些最大n行的顺序不重要,我们也可以使用np.argpartition -

df.iloc[df.C.values.argpartition(-5)[:-6:-1]]

without using nlargest , by using sort_values 不使用nlargest ,使用sort_values

df.sort_values('C',ascending=False).iloc[:5,]

or using head 或使用head

df.sort_values('C',ascending=False).head(5)

or using quantile 或使用quantile

df[df.C>df.C.quantile(1-(5/len(df)))]

Quick and dirty 又脏又脏

df.where(df.C.nlargest()).dropna()

       A     B     C     D
7   98.0  52.0  93.0  65.0
13  76.0  20.0  86.0  68.0
16  83.0   6.0  92.0  51.0
22  97.0  15.0  84.0   8.0
24  32.0  80.0  87.0  34.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas Select 来自 dataframe 的行在单独列中的类别中具有最高值 - Pandas Select Rows from a dataframe with highest value within a category in separate column 基于列值的 DataFrame 中的 Pandas select 行? - Pandas select rows from a DataFrame based on column values? 根据熊猫列中的字符串值从DataFrame中选择行 - Select rows from a DataFrame based on string values in a column in pandas 根据熊猫列中值的最后一个字符从DataFrame中选择行 - Select rows from a DataFrame based on last characters of values in a column in pandas 使用 Groupby 根据 Pandas 中列中的值从 DataFrame 中选择 CONSECUTIVE 行 - Select CONSECUTIVE rows from a DataFrame based on values in a column in Pandas with Groupby Pandas_select 基于列值从 dataframe 中选择行 - Pandas_select rows from a dataframe based on column values 多索引熊猫数据框中的最高和最低列值 - Highest and lowest column values from multi index pandas dataframe 大熊猫:获得整个数据框中的最高值,以及行/列值? - pandas: get the highest values in an entire dataframe, and row/col values? Pandas数据框根据查询数据框中的值选择行,然后根据列值选择其他条件 - Pandas Dataframe Select rows based on values from a lookup dataframe and then another condition based on column value 来自 Pandas DataFrame 的 Select 行与另一个 DataFrame 中的列值完全相同 - Select rows from a Pandas DataFrame with exactly the same column values in another DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM