计算两对 X 和 y 之间的相似性的最佳做法是什么

Question

I have some values about one element.我对一个元素有一些价值。 For example, element1: values1, values2 .例如， element1: values1, values2 。 For each element, I need to calculate the 'score' for a given number of features.对于每个元素，我需要计算给定数量特征的“分数”。 Imagine that we have one feature that is represented as:想象一下，我们有一个特征表示为：

An high score for the feature1 is given by an high score of value1 And a low score of value2.特征 1 的高分由 value1 的高分和 value2 的低分给出。

So If I suppose that to an high score of value1 (1) And a low score of value2 (0) correspond an high score of 'feature1', what is the best practice to calculate the score of feature1 given as value1 And value2 two different scores?因此，如果我假设 value1 (1) 的高分和 value2 (0) 的低分对应于 'feature1' 的高分，那么计算作为 value1 和 value2 两个不同的 feature1 的得分的最佳实践是什么分数？ (For example value1=0.7, value=0.2). （例如 value1=0.7，value=0.2）。 I use Python as programming language, And I prefer to use sklearn ad module but every solution that fits well is accepted.我使用 Python 作为编程语言，我更喜欢使用 sklearn 广告模块，但每个适合的解决方案都被接受。

Answer 1

First normalize your data.首先规范化您的数据。 One type of normalization is to make your values1, values2 fit between the range [0,1].一种标准化是使您的 values1, values2 适合范围 [0,1] 之间。
Suppose the average 2-value characterization of the feature1 based on the normalized data is (.7, .2).假设基于归一化数据的特征 1 的平均 2 值特征是 (.7, .2)。 For any new 2-values (x,y) compute the distance between (x,y) and (.7,.2)对于任何新的 2 值 (x,y)，计算 (x,y) 和 (.7,.2) 之间的距离

When computing distance in machine learning, the sqrt component is usually not calculated.在机器学习中计算距离时，通常不计算 sqrt 分量。

dist^2 = (x-.7)^2 + (y-.2)^2

You might also be interested in calculating the error of a 2-value (x,y) wrt to (.7,.2) and can look into categorical cross entropy.您可能还对计算 2 值 (x,y) wrt 到 (.7,.2) 的误差感兴趣，并且可以研究分类交叉熵。

计算两对 X 和 y 之间的相似性的最佳做法是什么

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-02-11 05:22:03

计算两对 X 和 y 之间的相似性的最佳做法是什么

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-02-11 05:22:03

解决方案1
0 已采纳 2020-02-11 05:22:03