[英]Python Pandas groupby forloop & Idxmax
I have a DataFrame that must be grouped on three levels, and would then have the highest value returned. 我有一个DataFrame,必须将其分为三个级别,然后才能返回最高值。 Each day there is a return for each unique value, and I would like to find the highest return and the details.
每天都有唯一值的回报,我想找到最高的回报和详细信息。
data.groupby(['Company','Product','Industry'])['ROI'].idxmax()
The return would show that: 回报将显示:
Target - Dish Soap - House had a 5% ROI on 9/17
Best Buy - CDs - Electronics had a 3% ROI on 9/3
was the highest. 是最高的。
Here's some example data: 这是一些示例数据:
+----------+-----------+-------------+---------+-----+
| Industry | Product | Industry | Date | ROI |
+----------+-----------+-------------+---------+-----+
| Target | Dish Soap | House | 9/17/13 | 5% |
| Target | Dish Soap | House | 9/16/13 | 2% |
| BestBuy | CDs | Electronics | 9/1/13 | 1% |
| BestBuy | CDs | Electroincs | 9/3/13 | 3% |
| ...
Not sure if this would be a for loop, or using .ix. 不知道这是for循环还是使用.ix。
I think, if I understand you correctly, you could collect the index values in a Series using groupby
and idxmax()
, and then select those rows from df
using loc
: 我认为,如果我对您的理解正确,则可以使用
groupby
和idxmax()
收集Series中的索引值,然后使用loc
从df
选择这些行:
idx = data.groupby(['Company','Product','Industry'])['ROI'].idxmax()
data.loc[idx]
another option is to use reindex
: 另一种选择是使用
reindex
:
data.reindex(idx)
On a (different) dataframe I happened to have handy, it appears reindex
might be the faster option: 在我碰巧很方便的(不同的)数据帧上,看来
reindex
可能是更快的选择:
In [39]: %timeit df.reindex(idx)
10000 loops, best of 3: 121 us per loop
In [40]: %timeit df.loc[idx]
10000 loops, best of 3: 147 us per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.