在多索引情況下查找最大值的數據框列的出現

Question

我有一組數據，我正在嘗試評估每個參數的影響。 為此，我的第一個想法是嘗試計算出我的參數值在鎖定所有其他參數時產生最佳結果的概率，或更普遍的是在最佳x％時。 讓我們看一個例子，使其更清楚：

我的數據看起來像這樣（但級別更高）：

import pandas as pd
import numpy as np

iterables = [['a','b','c'], [1,2,3]]
np.random.seed(123)

columns_index = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(data= np.random.rand(2,9), columns = columns_index, index=['feature1', 'feature2'])

這應該為您帶來以下收益：

first            a                             b                      \
second           1         2         3         1         2         3   
feature1  0.696469  0.286139  0.226851  0.551315  0.719469  0.423106   
feature2  0.392118  0.343178  0.729050  0.438572  0.059678  0.398044   
first            c                      
second           1         2         3  
feature1  0.980764  0.684830  0.480932  
feature2  0.737995  0.182492  0.175452

現在，如果我對“ feature2”感興趣，並想檢查“ first”的影響，我可以這樣做：

df.loc['feature2'].groupby('second').max()
Out[272]: 
second
1    0.737995
2    0.343178
3    0.729050

現在，問題是，如何獲得以下信息：

最大值可通過以下方式獲得：

'first'= c代表'second'= 1
'first'= a代表'second'= 2
'第一'= a代表'第二'= 3

所以我想算一下：a：66.66％b：0％c：33.33％

希望這足夠清楚。 我也很想聽到任何更好的主意，如果您有主意，可以檢查不同參數的影響。

謝謝！

Answer 1

使用.idxmax獲取索引，即

df.loc['feature2'].groupby(level=1).idxmax()

second
1    (c, 1)
2    (a, 2)
3    (a, 3)

Answer 2

或者你可以嘗試這個..

df.stack().loc['feature2'].stack().groupby(level='second').apply(lambda x : x[x==x.max()])
Out[805]: 
second  second  first
1       1       c        0.737995
2       2       a        0.343178
3       3       a        0.729050

在多索引情況下查找最大值的數據框列的出現

問題描述

2 個解決方案

解決方案1
2 已采納 2017-11-02 16:21:12

解決方案2
0 2017-11-02 16:35:33

在多索引情況下查找最大值的數據框列的出現

問題描述

2 個解決方案

解決方案1 2 已采納 2017-11-02 16:21:12

解決方案2 0 2017-11-02 16:35:33

解決方案1
2 已采納 2017-11-02 16:21:12

解決方案2
0 2017-11-02 16:35:33