如何在 python 中使用多处理计算 f1 分数？

Question

I've got an array of paired binary labels: y_true , y_pred .我有一组成对的二进制标签： y_true ， y_pred 。 My array contains ~50 million elements, and I wish to evaluate success using f1 score preferably, or AUC.我的数组包含约 5000 万个元素，我希望最好使用f1 分数或 AUC 来评估成功。

However, calculating f1 using sklearn takes relatively long time – about half the time needed for an entire epoch.但是，使用sklearn计算 f1 需要相对较长的时间——大约是整个 epoch 所需时间的一半。 Calculating AUC was faster, but too slow as well.计算 AUC 更快，但也太慢了。 Similar question yielded Faster AUC in sklearn or python , but I'm not sure I can try this one.类似的问题在 sklearn 或 python 中产生了更快的 AUC ，但我不确定我可以试试这个。

Is there a way to speed up those calculations, perhaps with multiprocessing?有没有办法加快这些计算，也许是多处理？

Answer 1

Alright, so apparently Scikit-learn implementation ran too slow on my two vectors, so I modified this implementation , so it fits numpy arrays and it is now much faster (0.25 seconds compared to 50+ seconds on sk-learn).好吧，显然 Scikit-learn 实现在我的两个向量上运行得太慢了，所以我修改了这个实现，所以它适合 numpy arrays 现在比 50+ 秒快得多（0.25 秒）。 Original implementation using torch.Tensors.使用 torch.Tensors 的原始实现。

def f1_loss(y_true, y_pred, beta=1) -> numpy.float32:
    '''Calculate F1 score.
    
    The original implmentation is written by Michal Haltuf on Kaggle.
    
    Reference
    ---------
    - https://www.kaggle.com/rejpalcz/best-loss-function-for-f1-score-metric
    - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score
    - https://discuss.pytorch.org/t/calculating-precision-recall-and-f1-score-in-case-of-multi-label-classification/28265/6
    
    '''
    assert y_true.shape[1] == 1
    assert y_pred.shape[1] == 1
        
    
    tp = (y_true * y_pred).sum()
    tn = ((1 - y_true) * (1 - y_pred)).sum()
    fp = ((1 - y_true) * y_pred).sum()
    fn = (y_true * (1 - y_pred)).sum()
    
    epsilon = 1e-7
    
    precision = tp / (tp + fp + epsilon)
    recall = tp / (tp + fn + epsilon)
    
    f1 = (1 + beta**2)* (precision*recall) / (beta**2 * precision + recall + epsilon)

    return f1

如何在 python 中使用多处理计算 f1 分数？

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-06-22 22:49:49

如何在 python 中使用多处理计算 f1 分数？

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-06-22 22:49:49

解决方案1
0 已采纳 2020-06-22 22:49:49