[英]Keras - Implementation of custom loss function with multiple outputs
I am trying to replicate (a way smaller version) of the AlphaGo Zero system.我正在尝试复制(一个更小的版本)AlphaGo Zero 系统。 However, in the network model, I am having a problem.但是,在网络 model 中,我遇到了问题。 The loss function I am supposed to implement is the following:我应该实现的损失 function 如下:
Where:在哪里:
I pass to the network a list of channels (representing the game state) and an array (same size of the pi and p ) representing which actions are indeed valid (by putting 1
if valid, 0
otherwise).我向网络传递一个通道列表(表示游戏状态)和一个数组(大小相同的pi和p ),表示哪些动作确实有效(如果有效则输入1
,否则输入0
)。
As you can see, the loss function uses both the target and the network predictions for the calculation.如您所见,损失 function 使用目标和网络预测进行计算。 But after extensive search, when implementing my custom loss function, I can only pass as parameter y_true
and y_pred
even though I have two "y_true's" and two "y_pred's".但是经过广泛的搜索,在实现我的自定义损失 function 时,即使我有两个“y_true”和两个“y_pred”,我也只能作为参数y_true
和y_pred
传递。 I have tried using indexing to get those values but I'm pretty sure it is not working.我曾尝试使用索引来获取这些值,但我很确定它不起作用。
The modeling of the network and the custom loss function is in the code below:网络的建模和自定义损失 function 在下面的代码中:
def custom_loss(y_true, y_pred):
# I am pretty sure this does not work
output_prob_dist = y_pred[0]
output_value = y_pred[1]
label_prob_dist = y_true[0]
label_value = y_pred[1]
mse_loss = K.mean(K.square(label_value - output_value), axis=-1)
cross_entropy_loss = K.dot(K.transpose(label_prob_dist), output_prob_dist)
return mse_loss - cross_entropy_loss
def define_model():
"""Neural Network model implementation using Keras + Tensorflow."""
state_channels = Input(shape = (5,5,6), name='States_Channels_Input')
valid_actions_dist = Input(shape = (32,), name='Valid_Actions_Input')
conv = Conv2D(filters=10, kernel_size=2, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='Conv_Layer')(state_channels)
pool = MaxPooling2D(pool_size=(2, 2), name='Pooling_Layer')(conv)
flat = Flatten(name='Flatten_Layer')(pool)
# Merge of the flattened channels (after pooling) and the valid action
# distribution. Used only as input in the probability distribution head.
merge = concatenate([flat, valid_actions_dist])
#Probability distribution over actions
hidden_fc_prob_dist_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_1')(merge)
hidden_fc_prob_dist_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Prob_2')(hidden_fc_prob_dist_1)
output_prob_dist = Dense(32, kernel_regularizer=regularizers.l2(0.0001), activation='softmax', name='Output_Dist')(hidden_fc_prob_dist_2)
#Value of a state
hidden_fc_value_1 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_1')(flat)
hidden_fc_value_2 = Dense(100, kernel_regularizer=regularizers.l2(0.0001), activation='relu', name='FC_Value_2')(hidden_fc_value_1)
output_value = Dense(1, kernel_regularizer=regularizers.l2(0.0001), activation='tanh', name='Output_Value')(hidden_fc_value_2)
model = Model(inputs=[state_channels, valid_actions_dist], outputs=[output_prob_dist, output_value])
model.compile(loss=custom_loss, optimizer='adam', metrics=['accuracy'])
return model
# In the main method
model = define_model()
# ...
# MCTS routine to collect the data for the network input
# ...
x_train = [channels_input, valid_actions_dist_input]
y_train = [dist_probs_label, who_won_label]
model.fit(x_train, y_train, epochs=10)
In short, my question is: how do I correctly implement this custom loss function that uses both the network outputs and label values of the network?简而言之,我的问题是:如何正确实现此自定义损失 function 使用网络输出和网络的 label 值?
I check their git and there is a lot going on;我检查了他们的 git 并且发生了很多事情; As showing in the equetion the final loss is the combination of three different losses, and the three networks are minimizing this final loss.如方程式所示,最终损失是三个不同损失的组合,三个网络正在最小化这个最终损失。 Their code of losses is below:他们的损失代码如下:
# train ops
policy_cost = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits_v2(
logits=logits, labels=tf.stop_gradient(labels['pi_tensor'])))
value_cost = params['value_cost_weight'] * tf.reduce_mean(
tf.square(value_output - labels['value_tensor']))
reg_vars = [v for v in tf.trainable_variables()
if 'bias' not in v.name and 'beta' not in v.name]
l2_cost = params['l2_strength'] * \
tf.add_n([tf.nn.l2_loss(v) for v in reg_vars])
combined_cost = policy_cost + value_cost + l2_cost
You can refer this and make your changes accordingly.您可以参考此内容并相应地进行更改。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.