Python Pandas Groupby forloop和Idxmax

Question

我有一個DataFrame，必須將其分為三個級別，然后才能返回最高值。 每天都有唯一值的回報，我想找到最高的回報和詳細信息。

data.groupby(['Company','Product','Industry'])['ROI'].idxmax()

回報將顯示：

Target   - Dish Soap - House       had a 5% ROI on 9/17
Best Buy - CDs       - Electronics had a 3% ROI on 9/3

是最高的。

這是一些示例數據：

+----------+-----------+-------------+---------+-----+
| Industry | Product   | Industry    | Date    | ROI |
+----------+-----------+-------------+---------+-----+
| Target   | Dish Soap | House       | 9/17/13 | 5%  |
| Target   | Dish Soap | House       | 9/16/13 | 2%  |
| BestBuy  | CDs       | Electronics | 9/1/13  | 1%  |
| BestBuy  | CDs       | Electroincs | 9/3/13  | 3%  |
| ...

不知道這是for循環還是使用.ix。

Answer 1

我認為，如果我對您的理解正確，則可以使用groupby和idxmax()收集Series中的索引值，然后使用loc從df選擇這些行：

idx =  data.groupby(['Company','Product','Industry'])['ROI'].idxmax()
data.loc[idx]

另一種選擇是使用reindex ：

data.reindex(idx)

在我碰巧很方便的（不同的）數據幀上，看來reindex可能是更快的選擇：

In [39]: %timeit df.reindex(idx)
10000 loops, best of 3: 121 us per loop

In [40]: %timeit df.loc[idx]
10000 loops, best of 3: 147 us per loop

Python Pandas Groupby forloop和Idxmax

問題描述

1 個解決方案

解決方案1
5 2013-09-18 18:41:55

Python Pandas Groupby forloop和Idxmax

問題描述

1 個解決方案

解決方案1 5 2013-09-18 18:41:55

解決方案1
5 2013-09-18 18:41:55