低密度区域神经网络的回归精度

Question

I am developing a neural net which needs to predict values between -1 and 1. However, I am only really concerned about the values at the ends of scale, say between -1 and -0.7 and between 0.7 and 1.我正在开发一个需要预测 -1 到 1 之间的值的神经网络。但是，我只真正关心规模末端的值，比如 -1 到 -0.7 之间以及 0.7 到 1 之间的值。

I do not mind if 0.6, for example, gets predicted to be 0.1.例如，如果 0.6 被预测为 0.1，我不介意。 However, I do want to know if it's 0.8 or 0.9.但是，我确实想知道它是 0.8 还是 0.9。

The distribution of my data is roughly normal, so there are many more samples in the range where I'm not concerned about the accuracy.我的数据分布大致正常，因此在我不关心准确性的范围内还有更多样本。 It seems therefore that the training process is likely to lead to greater accuracy in the centre.因此，训练过程似乎可以提高中心的准确性。

How can I configure the training or engineer my expected result to overcome this?我如何配置培训或设计我的预期结果来克服这个问题？

Thanks very much.非常感谢。

Answer 1

You could assign the observations to deciles, turn it into a classification problem and either assign a greater weight to the ranges you care about in the loss or just simply oversample them during training.您可以将观察值分配给十分位数，将其转化为分类问题，并为您关心的损失范围分配更大的权重，或者只是在训练期间对它们进行过采样。 By default, I'd go with weighing the classes in the loss function, as it is straight-forward to match with a weighted metric.默认情况下，我会在损失函数中对类进行权重，因为与加权指标匹配是直接的。 Oversampling can be useful if you know that the distribution of your training data is different from the real data distribution.如果您知道训练数据的分布与实际数据分布不同，则过采样会很有用。

To assign certain classes a greater weight in the loss function with Keras, you can pass a class_weight parameter to Model.fit .要使用class_weight在损失函数中为某些类分配更大的权重，您可以将class_weight参数传递给Model.fit 。 If label 0 is the first decile and label 9 is the last decile, you could double the weight of the first and last two deciles as follows:如果标签 0 是第一个十分位数，而标签 9 是最后一个十分位数，则可以将前两个十分位数的权重加倍，如下所示：

class_weight = {
    0: 2,
    1: 2,
    2: 1,
    3: 1,
    4: 1,
    5: 1,
    6: 1,
    7: 1,
    8: 2,
    9: 2
}
model.fit(..., class_weight=class_weight)

To oversample certain classes, you'd include them more often in the batches than the class distribution would suggest.要对某些类进行过采样，您可以将它们包含在批次中的频率比类分布建议的要多。 The simplest way to implement this is to sample observation indices with numpy.random.choice that has an optional parameter to specify probabilities for each entry.实现这一点的最简单方法是使用numpy.random.choice对观察指数进行numpy.random.choice ，该选项具有一个可选参数来指定每个条目的概率。 (Note that Keras Model.fit also has a sample_weight parameter where you can assign weights to each observation in the training data that will be applied when computing the loss function, but the intended use case is to weigh samples by the confidence in their labels, so I don't think it's applicable here.) （请注意，Keras Model.fit也有一个sample_weight参数，您可以在其中为训练数据中的每个观察值分配权重，这些权重将在计算损失函数时应用，但预期用例是根据标签的置信度对样本进行权重，所以我认为它不适用于这里。）

低密度区域神经网络的回归精度

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-08-17 11:17:36

低密度区域神经网络的回归精度

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-08-17 11:17:36

解决方案1
1 已采纳 2019-08-17 11:17:36