[英]Filter DataFrame based on Max value in Column - Pandas
Using pandas, I have a DataFrame that looks like this: 使用pandas,我有一个如下所示的DataFrame:
Hour Browser Metric1 Metric2 Metric3
2013-08-18 00 IE 1000 500 3000
2013-08-19 00 FF 2000 250 6000
2013-08-20 00 Opera 3000 450 9000
2001-03-21 00 Chrome/29 3000 450 9000
2013-08-21 00 Chrome/29 3000 450 9000
2014-01-22 00 Chrome/29 3000 750 9000
I want to create an array of browsers which have a maximum value of Metric1 > 2000. Is there a best way to do this? 我想创建一个浏览器数组,其最大值为Metric1> 2000.有最好的方法吗? You can see basically what I am trying to do with the code below.
您基本上可以看到我尝试使用下面的代码。
browsers = df[df.Metric1.max() > 2000]['Browser'].unique()
You could groupby Browser and take the max: 您可以通过浏览器分组并获取最大值:
In [11]: g = df.groupby('Browser')
In [12]: g['Metric1'].max()
Out[12]:
Browser
Chrome/29 3000
FF 2000
IE 1000
Opera 3000
Name: Metric1, dtype: int64
In [13]: over2000 = g['Metric1'].max() > 2000
In [14]: over2000
Out[14]:
Browser
Chrome/29 True
FF False
IE False
Opera True
Name: Metric1, dtype: bool
To get out the array, use this as a boolean mask: 要获取数组,请将其用作布尔掩码:
In [15]: over2000[over2000].index.values
Out[15]: array(['Chrome/29', 'Opera'], dtype=object)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.