[英]Find index of first row closest to value in pandas DataFrame
So I have a dataframe containing multiple columns. 所以我有一个包含多列的数据框。 For each column, I would like to get the index of the first row that is nearly equal to a user specified number (eg within 0.05 of desired number).
对于每一列,我想得到第一行的索引,该索引几乎等于用户指定的数字(例如,在所需数字的0.05之内)。 The dataframe looks kinda like this:
数据框看起来有点像这样:
ix col1 col2 col3
0 nan 0.2 1.04
1 0.98 nan 1.5
2 1.7 1.03 1.91
3 1.02 1.42 0.97
Say I want the first row that is nearly equal to 1.0, I would expect the result to be: 假设我想要第一行几乎等于1.0,我希望结果如下:
I've tried an approach that makes use of argsort(): 我尝试过使用argsort()的方法:
df.iloc[(df.col1-1.0).abs().argsort()[:1]]
This would, according to other topics, give me the index of the row in col1 with the value closest to 1.0. 根据其他主题,这将给我col1中行的索引,其值最接近1.0。 However, it returns only a dataframe full of nans.
但是,它只返回一个充满nans的数据帧。 I would also imagine this method does not give the first value close to 1 it encounters per column, but rather the value that is closest to 1.
我还想象这个方法不会给出每列遇到的第一个值接近1,而是最接近1的值。
Can anyone help me with this? 谁能帮我这个?
Use DataFrame.sub
for difference, convert to absolute values by abs
, compare by lt
( <
) and last get index of first value by DataFrame.idxmax
: 使用
DataFrame.sub
表示差异,通过abs
转换为绝对值,使用lt
( <
)进行比较,最后通过DataFrame.idxmax
得到第一个值的DataFrame.idxmax
:
a = df.sub(1).abs().lt(0.05).idxmax()
print (a)
col1 1
col2 2
col3 0
dtype: int64
But for more general solution, working if failed boolean mask (no value is in tolerance) is appended new column filled by True
s with name NaN
: 但是对于更一般的解决方案,如果失败的布尔掩码(没有值在容差范围内)工作,则附加由名称为
NaN
True
s填充的新列:
print (df)
col1 col2 col3
ix
0 NaN 0.20 1.07
1 0.98 NaN 1.50
2 1.70 1.03 1.91
3 1.02 1.42 0.87
s = pd.Series([True] * len(df.columns), index=df.columns, name=np.nan)
a = df.sub(1).abs().lt(0.05).append(s).idxmax()
print (a)
col1 1.0
col2 2.0
col3 NaN
dtype: float64
Suppose, you have some tolerance value tol
for the nearly match threshold. 假设您对近似匹配阈值有一些容差值
tol
。 You can create a mask dataframe for values below the threshold and use first_valid_index() on each column to get the index of first match occurence. 您可以为低于阈值的值创建掩码数据帧,并对每列使用first_valid_index()以获取第一次匹配出现的索引。
tol = 0.05
mask = df[(df - 1).abs() < tol]
for col in df:
print(col, mask[col].first_valid_index())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.