簡體   English   中英

Pandas dataframe select 多行基於其中一列中的多個最大值

[英]Pandas dataframe select multiple rows based on multiple max values in one of the columns

我有以下 pandas dataframe:

    Number of Features         RSS  R Squared    R Adj        BIC        AIC     Features List
0                    1  1265.68675    0.53395  0.49158  101.54177  100.41187         (x1,)
1                    1   906.33634    0.66627  0.63593   97.20030   96.07040         (x2,)
2                    1  1939.40047    0.28587  0.22095  107.08970  105.95980         (x3,)
3                    1   883.86692    0.67454  0.64495   96.87394   95.74404         (x4,)
4                    2    57.90448    0.97868  0.97441   64.00724   62.31239      (x1 ,x2)
5                    2  1227.07206    0.54817  0.45780  103.70393  102.00908      (x1 ,x3)
6                    2    74.76211    0.97247  0.96697   67.32895   65.63411      (x1 ,x4)
7                    2   415.44273    0.84703  0.81643   89.62439   87.92954      (x2 ,x3)
8                    2   868.88013    0.68006  0.61607   99.21658   97.52173      (x2 ,x4)
9                    2   175.73800    0.93529  0.92235   78.43983   76.74499      (x3 ,x4)
10                   3    48.11061    0.98228  0.97638   64.16339   61.90360  (x1, x2 ,x3)
11                   3    47.97273    0.98234  0.97645   64.12608   61.86629  (x1, x2 ,x4)
12                   3    50.83612    0.98128  0.97504   64.87975   62.61995  (x1, x3 ,x4)
13                   3    73.81455    0.97282  0.96376   69.72808   67.46829  (x2, x3 ,x4)
14                   4    47.86364    0.98238  0.97356   66.66144   63.83669  (x1, x2, x3, x4)

我想 select select 多行基於其中一列中的最高兩個值,如下所示(期望的結果):

    Number of Features         RSS  R Squared    R Adj        BIC        AIC     Features List
0                    1   883.86692    0.67454  0.64495   96.87394   95.74404             (x4,)
1                    1   906.33634    0.66627  0.63593   97.20030   96.07040             (x2,)
2                    2    57.90448    0.97868  0.97441   64.00724   62.31239          (x1, x2)
3                    2    74.76211    0.97247  0.96697   67.32895   65.63411          (x1, x4)
4                    3    47.97273    0.98234  0.97645   64.12608   61.86629      (x1, x2, x4)
5                    3    48.11061    0.98228  0.97638   64.16339   61.90360      (x1, x2, x3)
6                    4    47.86364    0.98238  0.97356   66.66144   63.83669  (x1, x2, x3, x4)

我使用了以下代碼:

df_max = values_table[values_table.groupby('Number of Features')['R Squared'].transform(max) == values_table['R Squared']]

和以下代碼:

df_table = values_table.loc[values_table.groupby('Number of Features')['R Squared'].idxmax()]

它給了我以下結果(不是預期的結果):

    Number of Features        RSS  R Squared    R Adj       BIC       AIC     Features List
3                    1  883.86692    0.67454  0.64495  96.87394  95.74404             (x4,)
4                    2   57.90448    0.97868  0.97441  64.00724  62.31239          (x1, x2)
11                   3   47.97273    0.98234  0.97645  64.12608  61.86629      (x1, x2, x4)
14                   4   47.86364    0.98238  0.97356  66.66144  63.83669  (x1, x2, x3, x4)

如果先排序,然后分組,則可以顯示前 2 行

(df
 .sort_values(['Number of Features', 'R Squared'], ascending=[True, False])
 .groupby('Number of Features', sort=False)
 .head(2)
)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM