简体   繁体   English

使用 Pandas,如何从一组列中找到最小值/最大值和索引,满足相应的一组列的条件?

[英]Using Pandas, how can I find the min/max value and index from one set of columns, satisfying condition on a corresponding set of columns?

I have a DataFrame with two sets of columns that have matching names (x1, x2, ... and y1, y2, ...).我有一个 DataFrame 有两组具有匹配名称的列(x1, x2, ... 和 y1, y2, ...)。

For each row in my DataFrame, I need to make a new column containing the min/max x column, such that y is minimised/maximised respectively.对于我的 DataFrame 中的每一行,我需要创建一个包含最小/最大 x 列的新列,以便分别最小化/最大化 y。

Using Excel, I can get close the the desired result with this sort of formula:使用 Excel,我可以使用这种公式来接近所需的结果:

=MINIFS(<x-columns>,<y-columns>,MIN(<y-columns>))

=MAXIFS(<x-columns>,<y-columns>,MAX(<y-columns>))

Although I would also need to make use of Pandas' idxmin and idxmax to get the column names.虽然我还需要使用 Pandas 的idxminidxmax来获取列名。

As an example, the following row of data would need to return 55/x2 (min xi such that yi = ymin) and 56/x3 (max xi such that yi = ymax)例如,以下数据行需要返回 55/x2(最小 xi,使得 yi = ymin)和 56/x3(最大 xi,使得 yi = ymax)

df = pd.DataFrame([[30, 55, 56, 73, 50, 3, 0, 3, 0, 3]], columns=['x1', 'x2', 'x3', 'x4', 'x5', 'y1', 'y2', 'y3', 'y4', 'y5'])

df['ymin'] = df.filter(regex='^y').min(axis=1)
df['ymax'] = df.filter(regex='^y').max(axis=1)

This is my approach, after several trials and errors:这是我的方法,经过多次试验和错误:

new_df = (pd.wide_to_long(df.reset_index(), 
                stubnames=['x','y'], 
                i='index',
                j='xy')
            .reset_index()
            .drop('xy', axis=1)
            .groupby(['index', 'y'])['x'].agg(['max', 'min'])
            .groupby('index')
            .apply(lambda x: pd.Series(x.values[[0,-1], [1,0]],
                                       index=['ymin', 'ymax']) )
         )

Output:输出:

       ymin  ymax
index            
0        55    56

Update : if you also want the column name, this can be an option:更新:如果您还想要列名,这可以是一个选项:

new_df = (pd.wide_to_long(df.reset_index(), 
                stubnames=['x','y'], 
                i='index',
                j='xy')
            .reset_index()
         )

u = (new_df.groupby(['index', 'y'])['x'].agg(['idxmax','idxmin'])
         .groupby('index')
         .apply(lambda x: pd.Series(x.values[[0,-1], [1,0]],
                                       index=['ymin', 'ymax']) )    
    )

Then:然后:

new_df.loc[u['ymin']]

gives:给出:

   index  xy   x  y
1      0   2  55  0

and

new_df.loc[u['ymax']]

gives:给出:

   index  xy   x  y
2      0   3  56  3

Thanks to Quang Hoang, I've managed to put together this function, which gives the result I wanted:感谢 Quang Hoang,我设法将这个函数组合在一起,得到了我想要的结果:

def conditional_minmax(df, xprefix, yprefix):

    new_df = (pd.wide_to_long(df.reset_index(),
                              stubnames=[xprefix, yprefix],
                              i='index',
                              j='xy')
              .reset_index()
              .drop('xy', axis=1)
              .groupby(['index', yprefix])[xprefix].agg(['max', 'min'])
              .groupby('index')
              .apply(lambda x: pd.Series(x.values[[0, -1], [1, 0]],
                                         index=['_xmin', '_xmax']))
              )

    new_df['_xidxmin'] = abs(df.filter(regex='^' + xprefix).sub(new_df['_xmin'], axis=0)).idxmin(axis=1)
    new_df['_xidxmax'] = abs(df.filter(regex='^' + xprefix).sub(new_df['_xmax'], axis=0)).idxmin(axis=1)

    return new_df

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas,Python:如何在满足第二个条件的同时获得跨列的最大值 - Pandas, Python : How to get the max value across columns while also satisfying a second condition 在 pandas dataframe 中,如何在不使用循环的情况下根据一列的数据设置其他列的值? - In a pandas dataframe, how can I set the value of other columns based on the data from one column, without using a loop? 如何从熊猫数据框中找到最小和最大AND对应的x值? - How to find the min and max AND corresponding x-value from a pandas data frame? Pandas 如何从一列创建重复列表,并且只保留对应列的最大值? - Pandas How do I create a list of duplicates from one column, and only keep the highest value for the corresponding columns? 如何确定在Pandas DataFrame中将哪些列设置为索引? - How does one determine which columns to set as an index in a Pandas DataFrame? Python pandas,如何在列中找到最大值后的最小值 - Python pandas, how to find the min after max in columns 如何在 pandas dataframe 的多列中最小/最大值? - How to min/max value in multiple columns of a pandas dataframe? Pandas:如何在现有 DataFrame 的列上设置索引? - Pandas: How do I set index on the columns of an existing DataFrame? Python Pandas:如何将列设置为索引? - Python Pandas: How To Set Columns as an Index? 如何将 pandas Dataframe 的索引设置为列长度的索引? - How to set the index of a pandas Dataframe to that of the length of the Columns?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM