使用 Pandas，如何从一组列中找到最小值/最大值和索引，满足相应的一组列的条件？

Question

I have a DataFrame with two sets of columns that have matching names (x1, x2, ... and y1, y2, ...).我有一个 DataFrame 有两组具有匹配名称的列（x1, x2, ... 和 y1, y2, ...）。

For each row in my DataFrame, I need to make a new column containing the min/max x column, such that y is minimised/maximised respectively.对于我的 DataFrame 中的每一行，我需要创建一个包含最小/最大 x 列的新列，以便分别最小化/最大化 y。

Using Excel, I can get close the the desired result with this sort of formula:使用 Excel，我可以使用这种公式来接近所需的结果：

=MINIFS(<x-columns>,<y-columns>,MIN(<y-columns>))

=MAXIFS(<x-columns>,<y-columns>,MAX(<y-columns>))

Although I would also need to make use of Pandas' idxmin and idxmax to get the column names.虽然我还需要使用 Pandas 的idxmin和idxmax来获取列名。

As an example, the following row of data would need to return 55/x2 (min xi such that yi = ymin) and 56/x3 (max xi such that yi = ymax)例如，以下数据行需要返回 55/x2（最小 xi，使得 yi = ymin）和 56/x3（最大 xi，使得 yi = ymax）

df = pd.DataFrame([[30, 55, 56, 73, 50, 3, 0, 3, 0, 3]], columns=['x1', 'x2', 'x3', 'x4', 'x5', 'y1', 'y2', 'y3', 'y4', 'y5'])

df['ymin'] = df.filter(regex='^y').min(axis=1)
df['ymax'] = df.filter(regex='^y').max(axis=1)

Answer 1

This is my approach, after several trials and errors:这是我的方法，经过多次试验和错误：

new_df = (pd.wide_to_long(df.reset_index(), 
                stubnames=['x','y'], 
                i='index',
                j='xy')
            .reset_index()
            .drop('xy', axis=1)
            .groupby(['index', 'y'])['x'].agg(['max', 'min'])
            .groupby('index')
            .apply(lambda x: pd.Series(x.values[[0,-1], [1,0]],
                                       index=['ymin', 'ymax']) )
         )

Output:输出：

       ymin  ymax
index            
0        55    56

Update : if you also want the column name, this can be an option:更新：如果您还想要列名，这可以是一个选项：

new_df = (pd.wide_to_long(df.reset_index(), 
                stubnames=['x','y'], 
                i='index',
                j='xy')
            .reset_index()
         )

u = (new_df.groupby(['index', 'y'])['x'].agg(['idxmax','idxmin'])
         .groupby('index')
         .apply(lambda x: pd.Series(x.values[[0,-1], [1,0]],
                                       index=['ymin', 'ymax']) )    
    )

Then:然后：

new_df.loc[u['ymin']]

gives:给出：

   index  xy   x  y
1      0   2  55  0

and和

new_df.loc[u['ymax']]

gives:给出：

   index  xy   x  y
2      0   3  56  3

Answer 2

Thanks to Quang Hoang, I've managed to put together this function, which gives the result I wanted:感谢 Quang Hoang，我设法将这个函数组合在一起，得到了我想要的结果：

def conditional_minmax(df, xprefix, yprefix):

    new_df = (pd.wide_to_long(df.reset_index(),
                              stubnames=[xprefix, yprefix],
                              i='index',
                              j='xy')
              .reset_index()
              .drop('xy', axis=1)
              .groupby(['index', yprefix])[xprefix].agg(['max', 'min'])
              .groupby('index')
              .apply(lambda x: pd.Series(x.values[[0, -1], [1, 0]],
                                         index=['_xmin', '_xmax']))
              )

    new_df['_xidxmin'] = abs(df.filter(regex='^' + xprefix).sub(new_df['_xmin'], axis=0)).idxmin(axis=1)
    new_df['_xidxmax'] = abs(df.filter(regex='^' + xprefix).sub(new_df['_xmax'], axis=0)).idxmin(axis=1)

    return new_df

使用 Pandas，如何从一组列中找到最小值/最大值和索引，满足相应的一组列的条件？

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-12-02 16:30:57

解决方案2
0 2019-12-03 10:27:31

使用 Pandas，如何从一组列中找到最小值/最大值和索引，满足相应的一组列的条件？

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-12-02 16:30:57

解决方案2 0 2019-12-03 10:27:31

解决方案1
1 已采纳 2019-12-02 16:30:57

解决方案2
0 2019-12-03 10:27:31