查找最大值和列名的等效于此SQL的Python / pandas是什么？

Question

什么是MAX(variable)语句的python / pandas等效项，例如：

SELECT ID, Name FROM Table5 WHERE 
Friend_count = (SELECT MAX(friend_count) FROM Table5);

（我正在尝试学习如何在Python中做一些通常在SQL中做的事情。我认为我可以在熊猫中做到这一点，但没有找到方法。）

Answer 1

在您的DataFrame上使用idxmax()方法DataFrame样？

import numpy as np
import pandas as pd
from ggplot import meat

我在这里使用ggplot中的肉类数据集。

In [18]: meat
Out[18]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 827 entries, 0 to 826
Data columns (total 8 columns):
date               827  non-null values
beef               827  non-null values
veal               827  non-null values
pork               827  non-null values
lamb_and_mutton    827  non-null values
broilers           635  non-null values
other_chicken      143  non-null values
turkey             635  non-null values
dtypes: datetime64[ns](1), float64(7)

假设您要查找beef产量最高的beef行。

In [36]: meat.beef.max()
Out[36]: 2512.0

在SQL中，你可能会做

SELECT 
    * 
FROM 
    meat 
WHERE
    beef = (SELECT max(beef) FROM meat) ;

使用熊猫，您可以使用idxmax来完成此操作，如下所示：

In [35]: meat.ix[meat.beef.idxmax()]
Out[35]:
date               2002-10-01 00:00:00
beef                              2512
veal                              18.7
pork                              1831
lamb_and_mutton                   19.7
broilers                        2953.3
other_chicken                     50.7
turkey                           525.9
Name: 705, dtype: object

idxmax非常棒，如果您的数据是基于日期或时间的，它也应该可以使用。

In [42]: ts = meat.set_index(['date'])

In [43]: ts.beef.max()
Out[43]: 2512.0

In [44]: ts.beef.idxmax()
Out[44]: Timestamp('2002-10-01 00:00:00', tz=None)

In [45]: ts.ix[ts.beef.idxmax()]
Out[45]:
beef               2512.0
veal                 18.7
pork               1831.0
lamb_and_mutton      19.7
broilers           2953.3
other_chicken        50.7
turkey              525.9
Name: 2002-10-01 00:00:00, dtype: float64

Answer 2

假设您有一个Person类。 它具有一个属性friend_count。 这是一个找朋友最多的人的例子...

import operator

class Person(object):
    def __init__(self, friend_count):
        self.friend_count = friend_count

people = [Person(x) for x in [0, 1, 5, 10, 3]]
popular_person = max(people, key=operator.attrgetter('friend_count'))
print popular_person.friend_count # prints 10

Answer 3

熊猫的“系列” /“列”上有一个max方法：

In [1]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [2]: df
Out[2]: 
   A  B
0  1  2
1  3  4

选择列：

In [3]: s = df.A  # same as df['A']

并采取最大：

In [4]: s.max()
Out[4]: 3

您还可以在DataFrame上获取最大值：

In [5]: df.max() # over the columns
Out[5]: 
A    3
B    4
dtype: int64

In [6]: df.max(axis=1) # over the rows
Out[6]: 
0    2
1    4
dtype: int64

要返回所有具有最大值的行，应使用掩码：

In [7]: df.A == df.A.max()
Out[7]: 
0    False
1     True
Name: A, dtype: bool

In [8]: df[df.A == df.A.max()]
Out[8]: 
   A  B
1  3  4

Answer 4

为了从Python中的列表中获取最大值，只需使用max函数。 min 。 请参阅此处的文档。 如果希望基于对象的属性进行操作，则可以使用列表max(person.age for person in people)例如max(person.age for person in people) 。

如果您想获得年龄最高的人，则可以使用列表理解

oldest_age = max(person.age for person in people)
people_with_max_age = [person for person in people if people.age == oldest_age]

与SQL不同，您很少只想收集一个对象的n个属性-将它们附加在该对象上并收集所需的对象会更加有用。 如果要实现此目的，请参见@FogleBird的答案。

查找最大值和列名的等效于此SQL的Python / pandas是什么？

问题描述

4 个解决方案

解决方案1
2 已采纳 2013-11-14 03:51:54

解决方案2
1 2013-11-14 03:03:41

解决方案3
1 2013-11-14 03:42:55

解决方案4
0 2013-11-14 03:02:32

查找最大值和列名的等效于此SQL的Python / pandas是什么？

问题描述

4 个解决方案

解决方案1 2 已采纳 2013-11-14 03:51:54

解决方案2 1 2013-11-14 03:03:41

解决方案3 1 2013-11-14 03:42:55

解决方案4 0 2013-11-14 03:02:32

解决方案1
2 已采纳 2013-11-14 03:51:54

解决方案2
1 2013-11-14 03:03:41

解决方案3
1 2013-11-14 03:42:55

解决方案4
0 2013-11-14 03:02:32