如何在机器学习 model 中使用 test_proportion 数据？

Question

I have a data with 4000 CNN features and it is a binary classification problem.我有一个包含 4000 个 CNN 特征的数据，这是一个二元分类问题。 All I know about the test data is the proportions of 1 and 0. How can I tell to my model to predict test labels by using the proportions data?我所知道的测试数据是 1 和 0 的比例。如何告诉我的 model 使用比例数据预测测试标签？ (Like is there a way to say in order to reach this proportions I will give this instance 0.) （就像有没有办法说为了达到这个比例，我会给这个实例0。）

How can I use it to increase accuracy?如何使用它来提高准确性？ In my case the training data is mostly consist of 1 (85%) and 0(15%) However in my test data proportion of l is given as (%38) So it is much different than training data.在我的情况下，训练数据主要由 1 (85%) 和 0(15%) 组成，但是在我的测试数据中，l 的比例为 (%38)，因此它与训练数据有很大不同。

I worked a little bit with balancing the data and it helped.我在平衡数据方面做了一些工作，这很有帮助。 However my model still predicts 1 for nearly all of the data.然而，我的 model 仍然预测几乎所有数据的 1。 It may occur because of the adaptation problem also.它也可能由于适应问题而发生。

As @birdwatch suggested I decrease the threshold for the 0 value and try to increase the 0 label count on the prediction.正如@birdwatch 建议的那样，我降低了 0 值的阈值并尝试增加预测中的 0 label 计数。

# Predicting the Test set results 
y_pred = classifier.predict_proba(X_test) 
threshold=0.3 
y_pred [:,0] = (y_pred [:,0] < threshold).astype('int')

Before the number of classes were as in follows:前班数如下：

 1 :   8906
 0 :   2968

After changing threshold now it is现在更改阈值后

1 :  3221
0 :  8653

However is there any other way that I can use test_proportions which ensures the result?但是，还有其他方法可以使用 test_proportions 来确保结果吗？

Answer 1

There isn't any sensible way to that.没有任何明智的方法。 Doing so would create a weird bias in the model.这样做会在 model 中产生奇怪的偏差。 One thing you could do is accept the less likely outcome only is it has high enough score.你可以做的一件事是接受不太可能的结果，只有它有足够高的分数。 Normally you'd use 0.5 threshold, but here you might take eg 0.7.通常您会使用 0.5 阈值，但在这里您可能会使用例如 0.7。

如何在机器学习 model 中使用 test_proportion 数据？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-02 11:30:48

如何在机器学习 model 中使用 test_proportion 数据？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-02 11:30:48

解决方案1
1 已采纳 2020-05-02 11:30:48