[英]get difference between the two minimum values rowwise pandas columns
I have a dataframe as a result of various pivot operations with float numbers (this example using integers for simplicity)我有一个 dataframe 作为各种 pivot 操作与浮点数的结果(此示例使用整数为简单起见)
import numpy as np
import pandas as pd
np.random.seed(365)
rows = 10
cols= {'col_a': [np.random.randint(100) for _ in range(rows)],
'col_b': [np.random.randint(100) for _ in range(rows)],
'col_c': [np.random.randint(100) for _ in range(rows)]}
data = pd.DataFrame(cols)
data
col_a col_b col_c
0 82 36 43
1 52 48 12
2 33 28 77
3 91 99 11
4 44 95 27
5 5 94 64
6 98 3 88
7 73 39 92
8 26 39 62
9 56 74 50
I want to detect the two minimum values in a row and get the diff in a new column.我想连续检测两个最小值并在新列中获取差异。 For example, in first row, the 2 minimum values are 36 and 43, so the difference will be 7例如,在第一行中,2 个最小值是 36 和 43,因此差值为 7
I've tried this way:我试过这样:
data['difference']=data[data.apply(lambda x: x.nsmallest(2).astype(float), axis=1).isna()].subtract(axis=1)
but i get:但我得到:
TypeError: f() missing 1 required positional argument: 'other'
Better use numpy:更好地使用 numpy:
a = np.sort(data)
data['difference'] = a[:,1]-a[:,0]
output: output:
col_a col_b col_c difference
0 82 36 43 7
1 52 48 12 36
2 33 28 77 5
3 91 99 11 80
4 44 95 27 17
5 5 94 64 59
6 98 3 88 85
7 73 39 92 34
8 26 39 62 13
9 56 74 50 6
Here is a way using rank()
这是一种使用rank()
的方法
(df.where(
df.rank(axis=1,method = 'first')
.le(2))
.stack()
.sort_values()
.groupby(level=0)
.agg(lambda x: x.diff().sum()))
If your df was larger and you wanted to potentially use more than the 2 smallest, this should work如果您的 df 较大并且您希望可能使用超过 2 个最小的,这应该可以
(df.where(
df.rank(axis=1,method = 'first')
.le(2))
.stack()
.sort_values(ascending=False)
.groupby(level=0)
.agg(lambda x: x.mul(-1).cumsum().add(x.max()*2).iloc[-1]))
Follow your idea with nsmallest
on rows在行上使用nsmallest
遵循您的想法
data['difference'] = data.apply(lambda x: x.nsmallest(2).tolist(), axis=1, result_type='expand').diff(axis=1)[1]
# or
data['difference'] = data.apply(lambda x: x.nsmallest(2).diff().iloc[-1], axis=1)
print(data)
col_a col_b col_c difference
0 82 36 43 7
1 52 48 12 36
2 33 28 77 5
3 91 99 11 80
4 44 95 27 17
5 5 94 64 59
6 98 3 88 85
7 73 39 92 34
8 26 39 62 13
9 56 74 50 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.