[英]python pandas 3 smallest & 3 largest values
How can I find the index of the 3 smallest and 3 largest values in a column in my pandas dataframe? 如何在熊猫数据框中的一列中找到3个最小和3个最大值的索引? I saw ways to find max and min, but none to get the 3.
我看到了找到最大值和最小值的方法,但是没有找到最大值和最小值的方法。
What have you tried? 你尝试了什么? You could sort with
s.sort()
and then call s.head(3).index
and s.tail(3).index
. 您可以使用
s.sort()
排序,然后调用s.head(3).index
和s.tail(3).index
。
With smaller Series, you're better off just sorting then taking head/tail! 对于较小的Series,最好先分类然后再取个头/尾!
This is a pandas feature request , should see in 0.14 (need to overcome some fiddly bits with different dtypes), an efficient solution for larger Series (> 1000 elements) is using kth_smallest
from pandas algos (warning this function mutates the array it's applied to so use a copy!): 这是一个大熊猫特征请求 ,在0.14(需要克服具有不同dtypes一些繁琐的比特),对于较大的系列(> 1000种元素)的有效解决方案应该看到利用
kth_smallest
从大熊猫交易算法(警告该功能变异它施加于阵列因此请使用副本!):
In [11]: s = pd.Series(np.random.randn(10))
In [12]: s
Out[12]:
0 0.785650
1 0.969103
2 -0.618300
3 -0.770337
4 1.532137
5 1.367863
6 -0.852839
7 0.967317
8 -0.603416
9 -0.889278
dtype: float64
In [13]: n = 3
In [14]: pd.algos.kth_smallest(s.values.astype(float), n - 1)
Out[14]: -0.7703374582084163
In [15]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)]
Out[15]:
3 -0.770337
6 -0.852839
9 -0.889278
dtype: float64
If you want this in order: 如果要按顺序进行此操作:
In [16]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)].order()
Out[16]:
9 -0.889278
6 -0.852839
3 -0.770337
dtype: float64
If you're worried about duplicates (join nth place) you can take the head: 如果您担心重复(排在第n位),可以采取行动:
In [17]: s[s <= pd.algos.kth_smallest(s.values.astype(float), n - 1)].order().head(n)
Out[17]:
9 -0.889278
6 -0.852839
3 -0.770337
dtype: float64
In [55]: import numpy as np
In [56]: import pandas as pd
In [57]: s = pd.Series(np.random.randn(5))
In [58]: s
Out[58]:
0 0.152037
1 0.194204
2 0.296090
3 1.071013
4 -0.324589
dtype: float64
In [59]: s.nsmallest(3) ## s.drop_duplicates().nsmallest(3); if duplicates exists
Out[59]:
4 -0.324589
0 0.152037
1 0.194204
dtype: float64
In [60]: s.nlargest(3) ## s.drop_duplicates().nlargest(3); if duplicates exists
Out[60]:
3 1.071013
2 0.296090
1 0.194204
dtype: float64
import pandas as pd
import numpy as np
np.random.seed(1)
x=np.random.randint(1,100,10)
y=np.random.randint(1000,10000,10)
x
array([38, 13, 73, 10, 76, 6, 80, 65, 17, 2])
y
array([8751, 4462, 6396, 6374, 3962, 3516, 9444, 4562, 5764, 9093])
data=pd.DataFrame({"age":x,
"salary":y})
data.nlargest(5,"age").nsmallest(5,"salary")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.