选择三列中最好的

Question

I have a dataset with three columns A, B and C.我有一个包含三列 A、B 和 C 的数据集。 I want to create a column where I select the two columns closest to each other and take the average.我想创建一个列，其中我 select 两列彼此最接近并取平均值。 Take the table below as an example:以下表为例：

A   B   C   Best of Three
3   2   5   2.5
4   3   1   3.5
1   5   2   1.5

For the first row, A and B are the closest pair, so the best of three column is (3+2)/2 = 2.5;对于第一行，A 和 B 是最接近的对，因此三列中最好的是 (3+2)/2 = 2.5； for the third row, A and C are the closest pair, so the best of three column is (1+2)/2 = 1.5.对于第三行，A 和 C 是最接近的对，因此三列中的最佳值是 (1+2)/2 = 1.5。 Below is my code.下面是我的代码。 It is quite unwieldy and quickly become too long if there are more columns.如果有更多的列，它会非常笨重并且很快就会变得太长。 Look forward to suggestions!期待建议！

data = {'A':[3,4,1],
        'B':[2,3,5],
        'C':[5,1,2]}
df = pd.DataFrame(data)

df['D'] = abs(df['A'] - df['B'])
df['E'] = abs(df['A'] - df['C'])
df['F'] = abs(df['C'] - df['B'])
df['G'] = min(df['D'], df['E'], df['F'])
if df['G'] = df['D']:
   df['Best of Three'] = (df['A'] + df['B'])/2
elif df['G'] = df['E']:
   df['Best of Three'] = (df['A'] + df['C'])/2
else:
   df['Best of Three'] = (df['B'] + df['C'])/2

Answer 1

First you need a method that finds the minimum diff between 2 elements in a list, the method also returns the median with the 2 values, this is returned as a tuple (diff, median)首先，您需要一个方法来查找列表中 2 个元素之间的最小差异，该方法还返回具有 2 个值的中位数，这作为元组(diff, median)返回

def min_list(values):
    return min((abs(x - y), (x + y) / 2)
               for i, x in enumerate(values)
               for y in values[i + 1:])

Then apply it in each row然后在每一行应用它

df = pd.DataFrame([[3, 2, 5, 6], [4, 3, 1, 10], [1, 5, 10, 20]],
                  columns=['A', 'B', 'C', 'D'])

df['best'] = df.apply(lambda x: min_list(x)[1], axis=1)
print(df)

Answer 2

Functions are your friends.函数是你的朋友。 You want to write a function that finds the two closest integers of an list, then pass it the list of the values of the row.您想编写一个 function 来查找列表中最接近的两个整数，然后将行的值列表传递给它。 Store those results and pass them to a second function that returns the average of two values.存储这些结果并将它们传递给第二个 function，它返回两个值的平均值。

(Also, your code would be much more readable if you replaced D , E , F , and G with descriptively named variables.) （此外，如果您将D 、 E 、 F和G替换为描述性命名的变量，您的代码将更具可读性。）

Answer 3

Solve by using itertools combinations generator:使用 itertools 组合生成器解决：

def get_closest_avg(s):
    c = list(itertools.combinations(s, 2))
    return sum(c[pd.Series(c).apply(lambda x: abs(x[0]-x[1])).idxmin()])/2
    
df['B3'] = df.apply(get_closest_avg, axis=1)

df:东风：

   A  B  C   B3
0  3  2  5  2.5
1  4  3  1  3.5
2  1  5  2  1.5

选择三列中最好的

问题描述

3 个解决方案

解决方案1
1 2021-04-14 19:26:15

解决方案2
1 2021-04-14 19:26:40

解决方案3
1 2021-04-15 05:29:24

选择三列中最好的

问题描述

3 个解决方案

解决方案1 1 2021-04-14 19:26:15

解决方案2 1 2021-04-14 19:26:40

解决方案3 1 2021-04-15 05:29:24

解决方案1
1 2021-04-14 19:26:15

解决方案2
1 2021-04-14 19:26:40

解决方案3
1 2021-04-15 05:29:24