在pandas DataFrame中查找最接近值的第一行索引

Question

So I have a dataframe containing multiple columns. 所以我有一个包含多列的数据框。 For each column, I would like to get the index of the first row that is nearly equal to a user specified number (eg within 0.05 of desired number). 对于每一列，我想得到第一行的索引，该索引几乎等于用户指定的数字（例如，在所需数字的0.05之内）。 The dataframe looks kinda like this: 数据框看起来有点像这样：

ix   col1   col2   col3
0    nan    0.2    1.04
1    0.98   nan    1.5
2    1.7    1.03   1.91
3    1.02   1.42   0.97

Say I want the first row that is nearly equal to 1.0, I would expect the result to be: 假设我想要第一行几乎等于1.0，我希望结果如下：

index 1 for col1 (not index 3 even though they are mathematically equally close to 1.0) col1的索引1（不是索引3，即使它们在数学上同样接近1.0）
index 2 for col2 col2的索引2
index 0 for col3 (not index 3 even though 0.97 is closer to 1 than 1.04) col3的索引0（不是索引3，即使0.97接近1而不是1.04）

I've tried an approach that makes use of argsort(): 我尝试过使用argsort（）的方法：

df.iloc[(df.col1-1.0).abs().argsort()[:1]]

This would, according to other topics, give me the index of the row in col1 with the value closest to 1.0. 根据其他主题，这将给我col1中行的索引，其值最接近1.0。 However, it returns only a dataframe full of nans. 但是，它只返回一个充满nans的数据帧。 I would also imagine this method does not give the first value close to 1 it encounters per column, but rather the value that is closest to 1. 我还想象这个方法不会给出每列遇到的第一个值接近1，而是最接近1的值。

Can anyone help me with this? 谁能帮我这个？

Answer 1

Use DataFrame.sub for difference, convert to absolute values by abs , compare by lt ( < ) and last get index of first value by DataFrame.idxmax : 使用DataFrame.sub表示差异，通过abs转换为绝对值，使用lt （ < ）进行比较，最后通过DataFrame.idxmax得到第一个值的DataFrame.idxmax ：

a = df.sub(1).abs().lt(0.05).idxmax()
print (a)
col1    1
col2    2
col3    0
dtype: int64

But for more general solution, working if failed boolean mask (no value is in tolerance) is appended new column filled by True s with name NaN : 但是对于更一般的解决方案，如果失败的布尔掩码（没有值在容差范围内）工作，则附加由名称为NaN True s填充的新列：

print (df)
    col1  col2  col3
ix                  
0    NaN  0.20  1.07
1   0.98   NaN  1.50
2   1.70  1.03  1.91
3   1.02  1.42  0.87

s = pd.Series([True] * len(df.columns), index=df.columns, name=np.nan)
a = df.sub(1).abs().lt(0.05).append(s).idxmax()
print (a)
col1    1.0
col2    2.0
col3    NaN
dtype: float64

Answer 2

Suppose, you have some tolerance value tol for the nearly match threshold. 假设您对近似匹配阈值有一些容差值tol 。 You can create a mask dataframe for values below the threshold and use first_valid_index() on each column to get the index of first match occurence. 您可以为低于阈值的值创建掩码数据帧，并对每列使用first_valid_index（）以获取第一次匹配出现的索引。

tol = 0.05
mask = df[(df - 1).abs() < tol]
for col in df:
    print(col, mask[col].first_valid_index())

在pandas DataFrame中查找最接近值的第一行索引

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-06-13 13:02:32

解决方案2
1 2018-06-13 13:17:48

在pandas DataFrame中查找最接近值的第一行索引

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-06-13 13:02:32

解决方案2 1 2018-06-13 13:17:48

解决方案1
2 已采纳 2018-06-13 13:02:32

解决方案2
1 2018-06-13 13:17:48