逐行获取两个最小值之间的差异 pandas 列

Question

I have a dataframe as a result of various pivot operations with float numbers (this example using integers for simplicity)我有一个 dataframe 作为各种 pivot 操作与浮点数的结果（此示例使用整数为简单起见）

    import numpy as np
    import pandas as pd
    
    np.random.seed(365)
    rows = 10
    cols= {'col_a': [np.random.randint(100) for _ in range(rows)],
           'col_b': [np.random.randint(100) for _ in range(rows)],
           'col_c': [np.random.randint(100) for _ in range(rows)]}
    data = pd.DataFrame(cols)


data
    col_a   col_b   col_c
0   82        36    43
1   52        48    12
2   33        28    77
3   91        99    11
4   44        95    27
5   5         94    64
6   98         3    88
7   73        39    92
8   26        39    62
9   56        74    50

I want to detect the two minimum values in a row and get the diff in a new column.我想连续检测两个最小值并在新列中获取差异。 For example, in first row, the 2 minimum values are 36 and 43, so the difference will be 7例如，在第一行中，2 个最小值是 36 和 43，因此差值为 7

I've tried this way:我试过这样：

data['difference']=data[data.apply(lambda x: x.nsmallest(2).astype(float), axis=1).isna()].subtract(axis=1)

but i get:但我得到：

TypeError: f() missing 1 required positional argument: 'other'

Answer 1

Better use numpy:更好地使用 numpy：

a = np.sort(data)
data['difference'] = a[:,1]-a[:,0]

output: output：

   col_a  col_b  col_c  difference
0     82     36     43           7
1     52     48     12          36
2     33     28     77           5
3     91     99     11          80
4     44     95     27          17
5      5     94     64          59
6     98      3     88          85
7     73     39     92          34
8     26     39     62          13
9     56     74     50           6

Answer 2

Here is a way using rank()这是一种使用rank()的方法

(df.where(
    df.rank(axis=1,method = 'first')
    .le(2))
    .stack()
    .sort_values()
    .groupby(level=0)
    .agg(lambda x: x.diff().sum()))

If your df was larger and you wanted to potentially use more than the 2 smallest, this should work如果您的 df 较大并且您希望可能使用超过 2 个最小的，这应该可以

(df.where(
    df.rank(axis=1,method = 'first')
    .le(2))
    .stack()
    .sort_values(ascending=False)
    .groupby(level=0)
    .agg(lambda x: x.mul(-1).cumsum().add(x.max()*2).iloc[-1]))

Answer 3

Follow your idea with nsmallest on rows在行上使用nsmallest遵循您的想法

data['difference'] = data.apply(lambda x: x.nsmallest(2).tolist(), axis=1, result_type='expand').diff(axis=1)[1]
# or
data['difference'] = data.apply(lambda x: x.nsmallest(2).diff().iloc[-1], axis=1)

print(data)

   col_a  col_b  col_c  difference
0     82     36     43           7
1     52     48     12          36
2     33     28     77           5
3     91     99     11          80
4     44     95     27          17
5      5     94     64          59
6     98      3     88          85
7     73     39     92          34
8     26     39     62          13
9     56     74     50           6

逐行获取两个最小值之间的差异 pandas 列

问题描述

3 个解决方案

解决方案1
1 2022-08-17 17:01:56

解决方案2
0 2022-08-17 17:05:59

解决方案3
0 2022-08-17 17:07:48

逐行获取两个最小值之间的差异 pandas 列

问题描述

3 个解决方案

解决方案1 1 2022-08-17 17:01:56

解决方案2 0 2022-08-17 17:05:59

解决方案3 0 2022-08-17 17:07:48

解决方案1
1 2022-08-17 17:01:56

解决方案2
0 2022-08-17 17:05:59

解决方案3
0 2022-08-17 17:07:48