简体   繁体   English

Python Dataframe 找到具有公差的最接近匹配值

[英]Python Dataframe find closest matching value with a tolerance

I have a data frame consisting of lists as elements.我有一个由列表作为元素组成的数据框。 I want to find the closest matching values within a percentage of a given value.我想在给定值的百分比范围内找到最接近的匹配值。 My code:我的代码:

df = pd.DataFrame({'A':[[1,2],[4,5,6]]})
df
           A
0     [1, 2]
1  [3, 5, 7]

# in each row, lets find a the values and their index that match 5 with 20% tolerance 
val = 5
tol = 0.2 # find values matching 5 or 20% within 5 (4 or 6)
df['Matching_index'] = (df['A'].map(np.array)-val).map(abs).map(np.argmin)

Present solution:目前的解决方案:

df
           A     Matching_index
0     [1, 2]     1                # 2 matches closely with 5 but this is wrong
1  [4, 5, 6]     1                # 5 matches with 5, correct.

Expected solution:预期的解决方案:

df
           A     Matching_index
0     [1, 2]     NaN              # No matching value, hence NaN
1  [4, 5, 6]     1                # 5 matches with 5, correct.

Idea is get difference with val and then replace to missing values if not match tolerance, last get np.nanargmin which raise error if all missing values, so added next condition with np.any :想法是与val取得差异,然后如果不匹配公差则替换为缺失值,最后获取np.nanargmin如果所有缺失值都会引发错误,因此使用np.any添加下一个条件:

def f(x):
    a = np.abs(np.array(x)-val)
    m = a <= val * tol
    return np.nanargmin(np.where(m, a, np.nan)) if m.any() else np.nan
    
df['Matching_index']  = df['A'].map(f)

print (df)
           A  Matching_index
0     [1, 2]             NaN
1  [4, 5, 6]             1.0

Pandas solution: Pandas解决方法:

df1 = pd.DataFrame(df['A'].tolist(), index=df.index).sub(val).abs()

df['Matching_index'] = df1.where(df1 <= val * tol).dropna(how='all').idxmin(axis=1)

I'm not sure it you want all indexes or just a counter.我不确定你想要所有索引还是只需要一个计数器。

Try this:尝试这个:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[[1,2],[4,5,6,7,8]]})

val = 5
tol = 0.3

def closest(arr,val,tol):
    idxs = [ idx for idx,el in enumerate(arr) if (np.abs(el - val) < val*tol)]
    result = len(idxs) if len(idxs) != 0 else np.nan
    return result

df['Matching_index'] = df['A'].apply(closest, args=(val,tol,))
df

If you want all the indexes, just return idxs instead of len(idxs) .如果您想要所有索引,只需返回idxs而不是len(idxs)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM