简体   繁体   English

在pandas DataFrame中查找最接近值的第一行索引

[英]Find index of first row closest to value in pandas DataFrame

So I have a dataframe containing multiple columns. 所以我有一个包含多列的数据框。 For each column, I would like to get the index of the first row that is nearly equal to a user specified number (eg within 0.05 of desired number). 对于每一列,我想得到第一行的索引,该索引几乎等于用户指定的数字(例如,在所需数字的0.05之内)。 The dataframe looks kinda like this: 数据框看起来有点像这样:

ix   col1   col2   col3
0    nan    0.2    1.04
1    0.98   nan    1.5
2    1.7    1.03   1.91
3    1.02   1.42   0.97

Say I want the first row that is nearly equal to 1.0, I would expect the result to be: 假设我想要第一行几乎等于1.0,我希望结果如下:

  • index 1 for col1 (not index 3 even though they are mathematically equally close to 1.0) col1的索引1(不是索引3,即使它们在数学上同样接近1.0)
  • index 2 for col2 col2的索引2
  • index 0 for col3 (not index 3 even though 0.97 is closer to 1 than 1.04) col3的索引0(不是索引3,即使0.97接近1而不是1.04)

I've tried an approach that makes use of argsort(): 我尝试过使用argsort()的方法:

df.iloc[(df.col1-1.0).abs().argsort()[:1]]

This would, according to other topics, give me the index of the row in col1 with the value closest to 1.0. 根据其他主题,这将给我col1中行的索引,其值最接近1.0。 However, it returns only a dataframe full of nans. 但是,它只返回一个充满nans的数据帧。 I would also imagine this method does not give the first value close to 1 it encounters per column, but rather the value that is closest to 1. 我还想象这个方法不会给出每列遇到的第一个值接近1,而是最接近1的值。

Can anyone help me with this? 谁能帮我这个?

Use DataFrame.sub for difference, convert to absolute values by abs , compare by lt ( < ) and last get index of first value by DataFrame.idxmax : 使用DataFrame.sub表示差异,通过abs转换为绝对值,使用lt< )进行比较,最后通过DataFrame.idxmax得到第一个值的DataFrame.idxmax

a = df.sub(1).abs().lt(0.05).idxmax()
print (a)
col1    1
col2    2
col3    0
dtype: int64

But for more general solution, working if failed boolean mask (no value is in tolerance) is appended new column filled by True s with name NaN : 但是对于更一般的解决方案,如果失败的布尔掩码(没有值在容差范围内)工作,则附加由名称为NaN True s填充的新列:

print (df)
    col1  col2  col3
ix                  
0    NaN  0.20  1.07
1   0.98   NaN  1.50
2   1.70  1.03  1.91
3   1.02  1.42  0.87

s = pd.Series([True] * len(df.columns), index=df.columns, name=np.nan)
a = df.sub(1).abs().lt(0.05).append(s).idxmax()
print (a)
col1    1.0
col2    2.0
col3    NaN
dtype: float64

Suppose, you have some tolerance value tol for the nearly match threshold. 假设您对近似匹配阈值有一些容差值tol You can create a mask dataframe for values below the threshold and use first_valid_index() on each column to get the index of first match occurence. 您可以为低于阈值的值创建掩码数据帧,并对每列使用first_valid_index()以获取第一次匹配出现的索引。

tol = 0.05
mask = df[(df - 1).abs() < tol]
for col in df:
    print(col, mask[col].first_valid_index())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM