查找两个数组之间的匹配项，并且第一个数组 == 1？

Question

I have two arrays (y_true and y_pred), both consisting of 0's and 1's of the same length.我有两个数组（y_true 和 y_pred），它们都由长度相同的 0 和 1 组成。

I want a more efficient/faster way of counting how many times y_pred == y_true, AND when y_pred == 1. I'm not interested in counting the matching 0's.我想要一种更有效/更快的方法来计算 y_pred == y_true 和 y_pred == 1 的次数。我对计算匹配的 0 不感兴趣。

Right now, my function looks like this using a for loop:现在，我的函数使用 for 循环如下所示：

from sklearn.metrics.scorer import make_scorer
# Make a custom metric function
def my_custom_accuracy(y_true, y_pred):       # Bring in the arrays
    good_matches = 0                          # Set counter to 0
    for num, i in enumerate(y_pred):          # for each y_pred in array...
        if i == y_true[num] & i == 1:         # if y_pred == y_true AND y_pred == 1...
            good_matches += 1                 # count it as a good match
    return float(good_matches / sum(y_true))  # return good matches as a % of all the 1's in y_true

....it works, but the for loop is slow and not very efficient. ....它有效，但 for 循环很慢而且效率不高。 I was hoping to utilize something like this:我希望利用这样的东西：

# Make a custom metric function
def my_custom_accuracy(y_true, y_pred):
    return float(sum(y_pred == y_true)) / sum(y_true)

...simple, but I don't know how to add in the "& y_pred == 1" part. ...简单，但我不知道如何添加“& y_pred == 1”部分。 Any ideas?有任何想法吗？ Thanks!谢谢！

Answer 1

You can use a list comprehension to check the lists against each other while filtering out y_pred == 0, then get your accuracy by dividing the matches by the length of the compare list.您可以使用列表理解来相互检查列表，同时过滤掉 y_pred == 0，然后通过将匹配项除以比较列表的长度来获得准确度。

compare = [p == t for p, t in zip(y_pred, y_true) if p == 1]
accuracy = compare.count(True) / len(compare)

Or for something utilizing numpy:或者对于使用 numpy 的东西：

mask = np.where(y_true == y_pred)
matches = y_pred[mask]
accuracy = np.sum(matches) / len(matches)

Answer 2

If the arrays aren't already boolean, make them boolean.如果数组还不是布尔值，请将它们设为布尔值。 This can be done cheaply with a view, or more simply with astype :这可以通过视图廉价地完成，或者更简单地使用astype ：

y_pred = y_pred.astype(bool)
y_true = y_true.astype(bool)

This step can be omitted if the arrays are already boolean, or if they really will never contain anything but zeros and ones.如果数组已经是布尔值，或者它们真的除了零和一之外永远不会包含任何内容，则可以省略此步骤。

Now good_matches is just现在good_matches只是

good_matches = np.sum(y_pred & y_true)

To see why that's so, note that in addition to obviously containing y_pred == y_true , the expression can only be true when y_pred is true, so it automatically implies y_pref == 1 and y_true == 1 , by the definition of the & operator.要了解为什么会这样，请注意，除了明显包含y_pred == y_true ，表达式只能在y_pred为真时为真，因此根据&运算符的定义，它自动暗示y_pref == 1和y_true == 1 .

Your final result is therefore因此，您的最终结果是

np.sum(y_pred & y_true) / np.sum(y_true)

This can be alternatively written as这也可以写成

np.count_nonzero(y_pred & y_true) / np.count_nonzero(y_true)

查找两个数组之间的匹配项，并且第一个数组 == 1？

问题描述

2 个解决方案

解决方案1
1 2020-01-22 05:04:31

解决方案2
1 已采纳 2020-01-22 05:16:39

查找两个数组之间的匹配项，并且第一个数组 == 1？

问题描述

2 个解决方案

解决方案1 1 2020-01-22 05:04:31

解决方案2 1 已采纳 2020-01-22 05:16:39

解决方案1
1 2020-01-22 05:04:31

解决方案2
1 已采纳 2020-01-22 05:16:39