[英]What model (loss function, etc) can be used in Keras regarding to categorical training with probability labels instead of one-hot encoding
I came to a problem when designing my keras model. 设计我的keras模型时遇到问题。
The training data(input) to the model is 2 sequential character-encoded lists and a non-sequential normal feature list. 模型的训练数据(输入)是2个连续的字符编码列表和一个非连续的正常特征列表。 The output is a list of probabilities of 5 different classes. 输出是5种不同类别的概率的列表。 The testing data has the same features while the output is a single class label instead of probability. 测试数据具有相同的功能,而输出是单个类别标签,而不是概率。 The task is to build a model learning from the training probability to predict the actual class on testing data. 任务是从训练概率中建立模型学习,以预测测试数据上的实际类别。
For example, the data looks like 例如,数据看起来像
X_train, X_test = Sequential feature 1, Sequential feature 2, Non-sequential feature 3
y_train = probability for class 1, probability for class 2 ... , probability for class 5
y_test = 0/1, 0/1, ..., 0/1
X_train, X_test = [0, 0, 0, 11, 21, 1] + [ 0, 0, 0, 0, 0, 121, 1, 16] + [1, 0, 0.543, 0.764, 1, 0, 1]
y_train = [0.132561 , 0.46975598, 0.132561 , 0.132561 , 0.132561]
y_test = [0, 1, 0, 0, 0]
I have built two CNN model for sequential data, and a normal dense layer for non-sequential data, concat them into one-mixed model with some dense layers and dropouts. 我建立了两个用于顺序数据的CNN模型,以及一个用于非顺序数据的普通密集层,将它们合并为具有一些密集层和缺失的一个混合模型。 I used categorical_crossentropy as my loss function, while my input is not strictly one-hot encoding. 我将categorical_crossentropy用作损失函数,而我的输入严格来说不是单热编码。 Will that be a problem? 那会是个问题吗? Is there any suggestion to improve the model? 是否有任何改进模型的建议?
PS: taking the argmax of the training probability is not always telling the truth of actual label, say a list of probability PS:取训练概率的argmax并不总是说出实际标签的真相,而是说出一个概率列表
[0.33719498 , 0.46975598, 0.06434968 , 0.06434968 , 0.06434968]
the actual label could be 实际的标签可能是
[1, 0, 0, 0, 0]
Using probabilistic labels as ground truths seem not to be a good idea. 将概率标签用作基本事实似乎不是一个好主意。 We assume the data drawn from a fixed distribution. 我们假设数据来自固定分布。 After being drawn, they are fixed events. 绘制后,它们是固定事件。
It seems to violate the assumption of the learning problems from a theoretical view. 从理论角度看,这似乎违反了学习问题的假设。
I would suggest converting from probabilistic labels to one-hot labels and see if you experience an improvement. 我建议从概率标签转换为一键式标签,看看您是否有所改善。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.