Pandas dataframe select 多行基於其中一列中的多個最大值

Question

我有以下 pandas dataframe：

    Number of Features         RSS  R Squared    R Adj        BIC        AIC     Features List
0                    1  1265.68675    0.53395  0.49158  101.54177  100.41187         (x1,)
1                    1   906.33634    0.66627  0.63593   97.20030   96.07040         (x2,)
2                    1  1939.40047    0.28587  0.22095  107.08970  105.95980         (x3,)
3                    1   883.86692    0.67454  0.64495   96.87394   95.74404         (x4,)
4                    2    57.90448    0.97868  0.97441   64.00724   62.31239      (x1 ,x2)
5                    2  1227.07206    0.54817  0.45780  103.70393  102.00908      (x1 ,x3)
6                    2    74.76211    0.97247  0.96697   67.32895   65.63411      (x1 ,x4)
7                    2   415.44273    0.84703  0.81643   89.62439   87.92954      (x2 ,x3)
8                    2   868.88013    0.68006  0.61607   99.21658   97.52173      (x2 ,x4)
9                    2   175.73800    0.93529  0.92235   78.43983   76.74499      (x3 ,x4)
10                   3    48.11061    0.98228  0.97638   64.16339   61.90360  (x1, x2 ,x3)
11                   3    47.97273    0.98234  0.97645   64.12608   61.86629  (x1, x2 ,x4)
12                   3    50.83612    0.98128  0.97504   64.87975   62.61995  (x1, x3 ,x4)
13                   3    73.81455    0.97282  0.96376   69.72808   67.46829  (x2, x3 ,x4)
14                   4    47.86364    0.98238  0.97356   66.66144   63.83669  (x1, x2, x3, x4)

我想 select select 多行基於其中一列中的最高兩個值，如下所示（期望的結果）：

    Number of Features         RSS  R Squared    R Adj        BIC        AIC     Features List
0                    1   883.86692    0.67454  0.64495   96.87394   95.74404             (x4,)
1                    1   906.33634    0.66627  0.63593   97.20030   96.07040             (x2,)
2                    2    57.90448    0.97868  0.97441   64.00724   62.31239          (x1, x2)
3                    2    74.76211    0.97247  0.96697   67.32895   65.63411          (x1, x4)
4                    3    47.97273    0.98234  0.97645   64.12608   61.86629      (x1, x2, x4)
5                    3    48.11061    0.98228  0.97638   64.16339   61.90360      (x1, x2, x3)
6                    4    47.86364    0.98238  0.97356   66.66144   63.83669  (x1, x2, x3, x4)

我使用了以下代碼：

df_max = values_table[values_table.groupby('Number of Features')['R Squared'].transform(max) == values_table['R Squared']]

和以下代碼：

df_table = values_table.loc[values_table.groupby('Number of Features')['R Squared'].idxmax()]

它給了我以下結果（不是預期的結果）：

    Number of Features        RSS  R Squared    R Adj       BIC       AIC     Features List
3                    1  883.86692    0.67454  0.64495  96.87394  95.74404             (x4,)
4                    2   57.90448    0.97868  0.97441  64.00724  62.31239          (x1, x2)
11                   3   47.97273    0.98234  0.97645  64.12608  61.86629      (x1, x2, x4)
14                   4   47.86364    0.98238  0.97356  66.66144  63.83669  (x1, x2, x3, x4)

Answer 1

如果先排序，然后分組，則可以顯示前 2 行

(df
 .sort_values(['Number of Features', 'R Squared'], ascending=[True, False])
 .groupby('Number of Features', sort=False)
 .head(2)
)

Pandas dataframe select 多行基於其中一列中的多個最大值

問題描述

1 個解決方案

解決方案1
1 已采納 2020-04-26 23:13:35

Pandas dataframe select 多行基於其中一列中的多個最大值

問題描述

1 個解決方案

解決方案1 1 已采納 2020-04-26 23:13:35

解決方案1
1 已采納 2020-04-26 23:13:35