如何在python中获取前N个值

Question

I have a list of value 我有一个价值清单

say 说

df = DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],
   ....:                 'key2' : ['one', 'two', 'one', 'two', 'one'],
   ....:                 'data1' : abs(np.random.randn(5)*100),
   ....:                 'data2' : np.random.randn(5)})

So if Here's my data , 因此，如果这是我的数据，

I want to return only top 3 value of data1 and return all 4 columns 我只想返回data1的前3个值并返回所有4列

what would be the best way to do this other than a lot of if statement that I have in mind. 除了我想到的许多if陈述之外，什么是最好的方法？

I was looking into nlargest , but not sure how could I do this 我一直在寻找最大的对象，但不确定该怎么做

========================update ========================= =======================更新======================

so if run above would get this result 所以如果在上面运行会得到这个结果

在此处输入图片说明

I would like to get return df that only have rowindex of 1,2,3 because they have highest top 3 rank of data1 ( 98,94,95 ) 我想返回只有1,2,3的rowindex的return df，因为它们的data1（98,94,95）的前3名最高

Answer 1

In [271]: df
Out[271]: 
      data1     data2 key1 key2
0 -1.318436  0.829593    a  one
1  0.172596 -0.541057    a  two
2 -2.071856 -0.181943    b  one
3  0.183276 -1.889666    b  two
4  0.558144 -1.016027    a  one

In [272]: df.ix[df['data1'].argsort()[-3:]]
Out[272]: 
      data1     data2 key1 key2
1  0.172596 -0.541057    a  two
3  0.183276 -1.889666    b  two
4  0.558144 -1.016027    a  one

Although heapq.nlargest may be theoretically more efficient , in practice even for fairly large DataFrames, argsort tends to be quicker: 尽管理论上说 heapq.nlargest 可能更有效，但实际上即使对于相当大的DataFrame， argsort倾向于更快：

import heapq
import pandas as pd
df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a']*10000,
                 'key2' : ['one', 'two', 'one', 'two', 'one']*10000,
                 'data1' : np.random.randn(50000),
                 'data2' : np.random.randn(50000)})

In [274]: %timeit df.ix[df['data1'].argsort()[-3:]]
100 loops, best of 3: 5.62 ms per loop

In [275]: %timeit df.iloc[heapq.nlargest(3, df.index, key=lambda x: df['data1'].iloc[x])]
1 loops, best of 3: 1.03 s per loop

Answer 2

按data1列的值降序排列：

df.sort(['data1'], ascending=False)[:3]

如何在python中获取前N个值

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-10-13 20:55:03

解决方案2
1 2013-10-13 21:02:18

如何在python中获取前N个值

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-10-13 20:55:03

解决方案2 1 2013-10-13 21:02:18

解决方案1
3 已采纳 2013-10-13 20:55:03

解决方案2
1 2013-10-13 21:02:18