从 Pandas 的列中选择下一个数字

Question

ID     Op     Cl     V        C   R0   R1   R2   R3   R4   R5
UN   22.85  22.86  8830500  0.21  25   34   12   87   105  102
SS   55.01  52.67  6500     5.45  84   122  147  124  644  788   
PN   90.00  90.99  1000     102   89   55   100  156  44   87     
PI   184.99 182.38 15000    84    56   77   97   45   44   33

我想创建一个新列，显示R0,R1,R2,R3,R4,R5列中'Cl'之后的下一个最大值。 以下是我的预期结果：

ID     Op     Cl     V        C   R0   R1   R2   R3   R4   R5  X
UN  22.85  22.86  8830500  0.21   25   34   12   87   105  102 25
SS   55.01  52.67  6500     5.45  84   122  147  124  644  788 84  
PN   90.00  90.99  1000     102   89   55   100  156  44   87 100   
PI   184.99 182.38 15000    84    56   77   97   45   44   33 NaN

我一直在研究它，但没有运气。 一些帮助将不胜感激，谢谢！

Answer 1

取决于你所说的下一个最大是什么意思。 如果您的意思是从R0->R5的顺序，我们可以尝试idxmax ：

# extract the `R` columns
s = df.filter(like='R')

# find out where these columns are larger than `Cl`:
mask = s.gt(df['Cl'], axis='rows')

# extract the values with `idxmax` and `lookup`:
df['X'] = np.where(mask.any(1), s.lookup(s.index,mask.idxmax(1)), np.nan)

输出：

   ID      Op      Cl        V       C  R0   R1   R2   R3   R4   R5      X
0  UN   22.85   22.86  8830500    0.21  25   34   12   87  105  102   25.0
1  SS   55.01   52.67     6500    5.45  84  122  147  124  644  788   84.0
2  PN   90.00   90.99     1000  102.00  89   55  100  156   44   87  100.0
3  PI  184.99  182.38    15000   84.00  56   77   97   45   44   33    NaN

如果下一个最大，你的意思是在值方面，我们可以用sort修改上面的内容：

# extract and sort by rows
s = np.sort(df.filter(like='R').values, axis=1)

# now we work with numpy data:
mask = s > df['Cl'].values[:,None]

# check and assign
df['X'] = np.where(mask.any(1), s[np.arange(s.shape[0]),mask.argmax(1)], np.nan)

然后你几乎有相同的输出（对于这个示例数据），但当然具有上述含义。

Answer 2

另一种选择：

def func(x):
    R_values = x[1:]
    idx_greater = R_values > x[0]
    return np.where(idx_greater.any(), R_values[idx_greater].min(), np.nan)

df['X'] = df.filter(regex='Cl|^R').apply(func, axis=1)

输出：

   ID      Op      Cl        V       C  R0   R1   R2   R3   R4   R5      X
0  UN   22.85   22.86  8830500    0.21  25   34   12   87  105  102   25.0
1  SS   55.01   52.67     6500    5.45  84  122  147  124  644  788   84.0
2  PN   90.00   90.99     1000  102.00  89   55  100  156   44   87  100.0
3  PI  184.99  182.38    15000   84.00  56   77   97   45   44   33    nan

这可能比@Quang Hoang 方法慢一点，也不太优雅。

这背后的逻辑是每行应用一个函数来验证是否有任何 R 值大于Cl列中的值，然后获取这些值的最小值，否则返回 NaN。

注意： Cl列应在R列之前，如您提供的数据。

从 Pandas 的列中选择下一个数字

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-11-23 17:18:43

解决方案2
1 2020-11-23 17:41:02

从 Pandas 的列中选择下一个数字

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-11-23 17:18:43

解决方案2 1 2020-11-23 17:41:02

解决方案1
2 已采纳 2020-11-23 17:18:43

解决方案2
1 2020-11-23 17:41:02