Python Dataframe 找到具有公差的最接近匹配值

Question

I have a data frame consisting of lists as elements.我有一个由列表作为元素组成的数据框。 I want to find the closest matching values within a percentage of a given value.我想在给定值的百分比范围内找到最接近的匹配值。 My code:我的代码：

df = pd.DataFrame({'A':[[1,2],[4,5,6]]})
df
           A
0     [1, 2]
1  [3, 5, 7]

# in each row, lets find a the values and their index that match 5 with 20% tolerance 
val = 5
tol = 0.2 # find values matching 5 or 20% within 5 (4 or 6)
df['Matching_index'] = (df['A'].map(np.array)-val).map(abs).map(np.argmin)

Present solution:目前的解决方案：

df
           A     Matching_index
0     [1, 2]     1                # 2 matches closely with 5 but this is wrong
1  [4, 5, 6]     1                # 5 matches with 5, correct.

Expected solution:预期的解决方案：

df
           A     Matching_index
0     [1, 2]     NaN              # No matching value, hence NaN
1  [4, 5, 6]     1                # 5 matches with 5, correct.

Answer 1

Idea is get difference with val and then replace to missing values if not match tolerance, last get np.nanargmin which raise error if all missing values, so added next condition with np.any :想法是与val取得差异，然后如果不匹配公差则替换为缺失值，最后获取np.nanargmin如果所有缺失值都会引发错误，因此使用np.any添加下一个条件：

def f(x):
    a = np.abs(np.array(x)-val)
    m = a <= val * tol
    return np.nanargmin(np.where(m, a, np.nan)) if m.any() else np.nan
    
df['Matching_index']  = df['A'].map(f)

print (df)
           A  Matching_index
0     [1, 2]             NaN
1  [4, 5, 6]             1.0

Pandas solution: Pandas解决方法：

df1 = pd.DataFrame(df['A'].tolist(), index=df.index).sub(val).abs()

df['Matching_index'] = df1.where(df1 <= val * tol).dropna(how='all').idxmin(axis=1)

Answer 2

I'm not sure it you want all indexes or just a counter.我不确定你想要所有索引还是只需要一个计数器。

Try this:尝试这个：

import pandas as pd
import numpy as np

df = pd.DataFrame({'A':[[1,2],[4,5,6,7,8]]})

val = 5
tol = 0.3

def closest(arr,val,tol):
    idxs = [ idx for idx,el in enumerate(arr) if (np.abs(el - val) < val*tol)]
    result = len(idxs) if len(idxs) != 0 else np.nan
    return result

df['Matching_index'] = df['A'].apply(closest, args=(val,tol,))
df

If you want all the indexes, just return idxs instead of len(idxs) .如果您想要所有索引，只需返回idxs而不是len(idxs) 。

Python Dataframe 找到具有公差的最接近匹配值

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-05-03 14:53:26

解决方案2
1 2022-05-03 15:17:55

Python Dataframe 找到具有公差的最接近匹配值

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-05-03 14:53:26

解决方案2 1 2022-05-03 15:17:55

解决方案1
1 已采纳 2022-05-03 14:53:26

解决方案2
1 2022-05-03 15:17:55