[英]Using Pandas, how can I find the min/max value and index from one set of columns, satisfying condition on a corresponding set of columns?
I have a DataFrame with two sets of columns that have matching names (x1, x2, ... and y1, y2, ...).我有一个 DataFrame 有两组具有匹配名称的列(x1, x2, ... 和 y1, y2, ...)。
For each row in my DataFrame, I need to make a new column containing the min/max x column, such that y is minimised/maximised respectively.对于我的 DataFrame 中的每一行,我需要创建一个包含最小/最大 x 列的新列,以便分别最小化/最大化 y。
Using Excel, I can get close the the desired result with this sort of formula:使用 Excel,我可以使用这种公式来接近所需的结果:
=MINIFS(<x-columns>,<y-columns>,MIN(<y-columns>))
=MAXIFS(<x-columns>,<y-columns>,MAX(<y-columns>))
Although I would also need to make use of Pandas' idxmin
and idxmax
to get the column names.虽然我还需要使用 Pandas 的
idxmin
和idxmax
来获取列名。
As an example, the following row of data would need to return 55/x2 (min xi such that yi = ymin) and 56/x3 (max xi such that yi = ymax)例如,以下数据行需要返回 55/x2(最小 xi,使得 yi = ymin)和 56/x3(最大 xi,使得 yi = ymax)
df = pd.DataFrame([[30, 55, 56, 73, 50, 3, 0, 3, 0, 3]], columns=['x1', 'x2', 'x3', 'x4', 'x5', 'y1', 'y2', 'y3', 'y4', 'y5'])
df['ymin'] = df.filter(regex='^y').min(axis=1)
df['ymax'] = df.filter(regex='^y').max(axis=1)
This is my approach, after several trials and errors:这是我的方法,经过多次试验和错误:
new_df = (pd.wide_to_long(df.reset_index(),
stubnames=['x','y'],
i='index',
j='xy')
.reset_index()
.drop('xy', axis=1)
.groupby(['index', 'y'])['x'].agg(['max', 'min'])
.groupby('index')
.apply(lambda x: pd.Series(x.values[[0,-1], [1,0]],
index=['ymin', 'ymax']) )
)
Output:输出:
ymin ymax
index
0 55 56
Update : if you also want the column name, this can be an option:更新:如果您还想要列名,这可以是一个选项:
new_df = (pd.wide_to_long(df.reset_index(),
stubnames=['x','y'],
i='index',
j='xy')
.reset_index()
)
u = (new_df.groupby(['index', 'y'])['x'].agg(['idxmax','idxmin'])
.groupby('index')
.apply(lambda x: pd.Series(x.values[[0,-1], [1,0]],
index=['ymin', 'ymax']) )
)
Then:然后:
new_df.loc[u['ymin']]
gives:给出:
index xy x y
1 0 2 55 0
and和
new_df.loc[u['ymax']]
gives:给出:
index xy x y
2 0 3 56 3
Thanks to Quang Hoang, I've managed to put together this function, which gives the result I wanted:感谢 Quang Hoang,我设法将这个函数组合在一起,得到了我想要的结果:
def conditional_minmax(df, xprefix, yprefix):
new_df = (pd.wide_to_long(df.reset_index(),
stubnames=[xprefix, yprefix],
i='index',
j='xy')
.reset_index()
.drop('xy', axis=1)
.groupby(['index', yprefix])[xprefix].agg(['max', 'min'])
.groupby('index')
.apply(lambda x: pd.Series(x.values[[0, -1], [1, 0]],
index=['_xmin', '_xmax']))
)
new_df['_xidxmin'] = abs(df.filter(regex='^' + xprefix).sub(new_df['_xmin'], axis=0)).idxmin(axis=1)
new_df['_xidxmax'] = abs(df.filter(regex='^' + xprefix).sub(new_df['_xmax'], axis=0)).idxmin(axis=1)
return new_df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.