在pyspark中使用带有辍学的Keras序列化模型

Question

I have several neural networks built using Keras that I used so far mostly in Jupyter. 我有几个使用Keras构建的神经网络，到目前为止，我大多数时候都在Jupyter中使用它。 I often save models from scikit-learn with joblib and Keras with json + hdf5 and use them in other notebooks without issue. 我经常使用joblib从scikit-learn中保存模型，并使用json + hdf5从Keras中保存模型，并在其他笔记本中使用它们而不会出现问题。

I made a Python Spark application that can make use of those serialized models in cluster mode. 我制作了一个Python Spark应用程序，可以在集群模式下使用那些序列化的模型。 joblib models are working fine however, I encountered an issue with Keras. joblib模型运行正常，但是，我遇到了Keras的问题。

Here is the model used in notebook and pyspark: 这是用于笔记本和pyspark的模型：

def build_gru_model():
    model = Sequential()
    model.add(Embedding(max_nb_words, 128, input_length=max_sequence_length, dropout=0.2))
    model.add(GRU(128, dropout_W=0.2, dropout_U=0.2))
    model.add(Dense(2, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

both called the same way: 两者都以相同的方式调用：

preds = model.predict_proba(data, verbose=0)

However, only in Spark I get the error: 但是，仅在Spark中，我得到了错误：

MissingInputError: ("An input of the graph, used to compute DimShuffle{x,x,x,x}(keras_learning_phase), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", keras_learning_phase)

I've done the mandatory search and found: https://github.com/fchollet/keras/issues/2430 which points to https://keras.io/getting-started/faq/ 我已经完成了强制搜索，发现： https : //github.com/fchollet/keras/issues/2430指向https://keras.io/getting-started/faq/

If I indeed remove dropout from my model, it works. 如果我确实从模型中删除了辍学，那就行得通。 However, I fail to understand how to implement something that would allow me to keep dropout during the training phase like described in the FAQ. 但是，我无法理解如何实施一些使我在培训阶段保持辍学的方法，如常见问题解答中所述。

Based on the model code, how one would accomplish this? 根据模型代码，如何做到这一点？

Answer 1

You can try to put (before your prediction) 您可以尝试放置（在进行预测之前）

import keras.backend as K
K.set_learning_phase(0)

It should set your learning phase to 0 (test time) 它应该将您的学习阶段设置为0（测试时间）

在pyspark中使用带有辍学的Keras序列化模型

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-04-05 14:52:53

在pyspark中使用带有辍学的Keras序列化模型

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-04-05 14:52:53

解决方案1
2 已采纳 2017-04-05 14:52:53