简体   繁体   English

逐行获取两个最小值之间的差异 pandas 列

[英]get difference between the two minimum values rowwise pandas columns

I have a dataframe as a result of various pivot operations with float numbers (this example using integers for simplicity)我有一个 dataframe 作为各种 pivot 操作与浮点数的结果(此示例使用整数为简单起见)

    import numpy as np
    import pandas as pd
    
    np.random.seed(365)
    rows = 10
    cols= {'col_a': [np.random.randint(100) for _ in range(rows)],
           'col_b': [np.random.randint(100) for _ in range(rows)],
           'col_c': [np.random.randint(100) for _ in range(rows)]}
    data = pd.DataFrame(cols)


data
    col_a   col_b   col_c
0   82        36    43
1   52        48    12
2   33        28    77
3   91        99    11
4   44        95    27
5   5         94    64
6   98         3    88
7   73        39    92
8   26        39    62
9   56        74    50

I want to detect the two minimum values in a row and get the diff in a new column.我想连续检测两个最小值并在新列中获取差异。 For example, in first row, the 2 minimum values are 36 and 43, so the difference will be 7例如,在第一行中,2 个最小值是 36 和 43,因此差值为 7

I've tried this way:我试过这样:

data['difference']=data[data.apply(lambda x: x.nsmallest(2).astype(float), axis=1).isna()].subtract(axis=1)

but i get:但我得到:

TypeError: f() missing 1 required positional argument: 'other'

Better use numpy:更好地使用 numpy:

a = np.sort(data)
data['difference'] = a[:,1]-a[:,0]

output: output:

   col_a  col_b  col_c  difference
0     82     36     43           7
1     52     48     12          36
2     33     28     77           5
3     91     99     11          80
4     44     95     27          17
5      5     94     64          59
6     98      3     88          85
7     73     39     92          34
8     26     39     62          13
9     56     74     50           6

Here is a way using rank()这是一种使用rank()的方法

(df.where(
    df.rank(axis=1,method = 'first')
    .le(2))
    .stack()
    .sort_values()
    .groupby(level=0)
    .agg(lambda x: x.diff().sum()))

If your df was larger and you wanted to potentially use more than the 2 smallest, this should work如果您的 df 较大并且您希望可能使用超过 2 个最小的,这应该可以

(df.where(
    df.rank(axis=1,method = 'first')
    .le(2))
    .stack()
    .sort_values(ascending=False)
    .groupby(level=0)
    .agg(lambda x: x.mul(-1).cumsum().add(x.max()*2).iloc[-1]))

Follow your idea with nsmallest on rows在行上使用nsmallest遵循您的想法

data['difference'] = data.apply(lambda x: x.nsmallest(2).tolist(), axis=1, result_type='expand').diff(axis=1)[1]
# or
data['difference'] = data.apply(lambda x: x.nsmallest(2).diff().iloc[-1], axis=1)
print(data)

   col_a  col_b  col_c  difference
0     82     36     43           7
1     52     48     12          36
2     33     28     77           5
3     91     99     11          80
4     44     95     27          17
5      5     94     64          59
6     98      3     88          85
7     73     39     92          34
8     26     39     62          13
9     56     74     50           6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM