从pandas.DataFrame的每列中获取最大值

Question

Here is my pandas.DataFrame : 这是我的pandas.DataFrame ：

import pandas as pd
data = pd.DataFrame({
  'first': [40, 32, 56, 12, 89],
  'second': [13, 45, 76, 19, 45],
  'third': [98, 56, 87, 12, 67]
}, index = ['first', 'second', 'third', 'fourth', 'fifth'])

I want to create a new DataFrame that will contain top 3 values from each column of my data DataFrame . 我想创建一个新的DataFrame ，它将包含data DataFrame每列的前3个值。

Here is an expected output: 这是预期的输出：

   first  second  third
0     89      76     98
1     56      45     87
2     40      45     67

How can I do that? 我怎样才能做到这一点？

Answer 1

Create a function to return the top three values of a series: 创建一个函数来返回系列的前三个值：

def sorted(s, num):
    tmp = s.sort_values(ascending=False)[:num]  # earlier s.order(..)
    tmp.index = range(num)
    return tmp

Apply it to your data set: 将其应用于您的数据集：

In [1]: data.apply(lambda x: sorted(x, 3))
Out[1]:
   first  second  third
0     89      76     98
1     56      45     87
2     40      45     67

Answer 2

With numpy you can get array of top-3 values along columns like follows: 使用numpy，您可以获得沿着列的前3个值的数组，如下所示：

>>> import numpy as np
>>> col_ind = np.argsort(data.values, axis=0)[::-1,:]
>>> ind_to_take = col_ind[:3,:] + np.arange(data.shape[1])*data.shape[0]
>>> np.take(data.values.T, ind_to_take)
array([[89, 76, 98],
       [56, 45, 87],
       [40, 45, 67]], dtype=int64)

You can convert back to DataFrame: 您可以转换回DataFrame：

>>> pd.DataFrame(_, columns = data.columns, index=data.index[:3])
       first  second  third
One       89      76     98
Two       56      45     87
Three     40      45     67

Answer 3

The other solutions (at the time of writing this), sort the DataFrame with super-linear complexity per column , but it can actually be done with linear time per column. 其他解决方案（在撰写本文时），对每列的超线性复杂度对DataFrame进行排序，但实际上可以使用每列的线性时间来完成。

first, numpy.partition partitions the k smallest elements at the k first positions (unsorted otherwise). 首先， numpy.partition在k个第一个位置分割k个最小元素（否则未分类）。 To get the k largest elements, we can use 为了得到k个最大的元素，我们可以使用

import numpy as np

-np.partition(-v, k)[: k]

Combining this with dictionary comprehension, we can use: 将此与字典理解相结合，我们可以使用：

>>> pd.DataFrame({c: -np.partition(-data[c], 3)[: 3] for c in data.columns})
    first   second  third
0   89  76  98
1   56  45  87
2   40  45  67

Answer 4

Alternative pandas solution: 替代熊猫解决方案：

In [6]: N = 3

In [7]: pd.DataFrame([df[c].nlargest(N).values.tolist() for c in df.columns],
   ...:              index=df.columns,
   ...:              columns=['{}_largest'.format(i) for i in range(1, N+1)]).T
   ...:
Out[7]:
           first  second  third
1_largest     89      76     98
2_largest     56      45     87
3_largest     40      45     67

Answer 5

Use nlargest like 使用nlargest类的

In [1594]: pd.DataFrame({c: data[c].nlargest(3).values for c in data})
Out[1594]:
   first  second  third
0     89      76     98
1     56      45     87
2     40      45     67

_where _哪里

In [1603]: data
Out[1603]:
        first  second  third
first      40      13     98
second     32      45     56
third      56      76     87
fourth     12      19     12
fifth      89      45     67

从pandas.DataFrame的每列中获取最大值

问题描述

5 个解决方案

解决方案1
9 已采纳 2013-12-09 18:25:43

解决方案2
3 2013-12-09 18:14:42

解决方案3
1 2015-05-27 00:39:06

解决方案4
0 2016-10-16 19:21:55

解决方案5
0

从pandas.DataFrame的每列中获取最大值

问题描述

5 个解决方案

解决方案1 9 已采纳 2013-12-09 18:25:43

解决方案2 3 2013-12-09 18:14:42

解决方案3 1 2015-05-27 00:39:06

解决方案4 0 2016-10-16 19:21:55

解决方案5 0

解决方案1
9 已采纳 2013-12-09 18:25:43

解决方案2
3 2013-12-09 18:14:42

解决方案3
1 2015-05-27 00:39:06

解决方案4
0 2016-10-16 19:21:55

解决方案5
0