[英]Choose the best of three columns
I have a dataset with three columns A, B and C.我有一个包含三列 A、B 和 C 的数据集。 I want to create a column where I select the two columns closest to each other and take the average.
我想创建一个列,其中我 select 两列彼此最接近并取平均值。 Take the table below as an example:
以下表为例:
A B C Best of Three
3 2 5 2.5
4 3 1 3.5
1 5 2 1.5
For the first row, A and B are the closest pair, so the best of three column is (3+2)/2 = 2.5;对于第一行,A 和 B 是最接近的对,因此三列中最好的是 (3+2)/2 = 2.5; for the third row, A and C are the closest pair, so the best of three column is (1+2)/2 = 1.5.
对于第三行,A 和 C 是最接近的对,因此三列中的最佳值是 (1+2)/2 = 1.5。 Below is my code.
下面是我的代码。 It is quite unwieldy and quickly become too long if there are more columns.
如果有更多的列,它会非常笨重并且很快就会变得太长。 Look forward to suggestions!
期待建议!
data = {'A':[3,4,1],
'B':[2,3,5],
'C':[5,1,2]}
df = pd.DataFrame(data)
df['D'] = abs(df['A'] - df['B'])
df['E'] = abs(df['A'] - df['C'])
df['F'] = abs(df['C'] - df['B'])
df['G'] = min(df['D'], df['E'], df['F'])
if df['G'] = df['D']:
df['Best of Three'] = (df['A'] + df['B'])/2
elif df['G'] = df['E']:
df['Best of Three'] = (df['A'] + df['C'])/2
else:
df['Best of Three'] = (df['B'] + df['C'])/2
First you need a method that finds the minimum diff between 2 elements in a list, the method also returns the median with the 2 values, this is returned as a tuple (diff, median)
首先,您需要一个方法来查找列表中 2 个元素之间的最小差异,该方法还返回具有 2 个值的中位数,这作为元组
(diff, median)
返回
def min_list(values):
return min((abs(x - y), (x + y) / 2)
for i, x in enumerate(values)
for y in values[i + 1:])
Then apply it in each row然后在每一行应用它
df = pd.DataFrame([[3, 2, 5, 6], [4, 3, 1, 10], [1, 5, 10, 20]],
columns=['A', 'B', 'C', 'D'])
df['best'] = df.apply(lambda x: min_list(x)[1], axis=1)
print(df)
Functions are your friends.函数是你的朋友。 You want to write a function that finds the two closest integers of an list, then pass it the list of the values of the row.
您想编写一个 function 来查找列表中最接近的两个整数,然后将行的值列表传递给它。 Store those results and pass them to a second function that returns the average of two values.
存储这些结果并将它们传递给第二个 function,它返回两个值的平均值。
(Also, your code would be much more readable if you replaced D
, E
, F
, and G
with descriptively named variables.) (此外,如果您将
D
、 E
、 F
和G
替换为描述性命名的变量,您的代码将更具可读性。)
Solve by using itertools combinations generator:使用 itertools 组合生成器解决:
def get_closest_avg(s):
c = list(itertools.combinations(s, 2))
return sum(c[pd.Series(c).apply(lambda x: abs(x[0]-x[1])).idxmin()])/2
df['B3'] = df.apply(get_closest_avg, axis=1)
df:东风:
A B C B3
0 3 2 5 2.5
1 4 3 1 3.5
2 1 5 2 1.5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.