简体繁体 English

在Google Cloud上训练不同的Keras模型的结果

[英]Results of training a Keras model different on Google Cloud

原文 2018-06-26 15:55:56 6 1 python/ tensorflow/ keras

I've created a script to train a keras neural net and have run it successfully on my machine (at the end of training there is roughly 0.8 validation accuracy). 我创建了一个脚本来训练keras神经网络，并已在我的机器上成功运行了该脚本（在训练结束时，验证精度约为0.8）。 However, when I try to run the exact same code (on the same data) on a Google Cloud VM instance I get drastically worse results (~0.2 validation accuracy). 但是，当我尝试在Google Cloud VM实例上运行完全相同的代码（对相同的数据）时，结果却大大恶化了（〜0.2验证精度）。

Git status confirms that the repo in the VM is up to date with master (same with my local machine), and I have verified that its versions of tf and keras are up to date (and same as my local machine). Git状态确认VM中的存储库是最新版本的master（与我的本地计算机相同），并且我已验证其tf和keras版本是最新的（与我的本地计算机相同）。 I've also set the numpy and tensorflow random seeds before importing Keras. 在导入Keras之前，我还设置了numpy和tensorflow随机种子。

Has anyone run into a problem like this before? 有人遇到过这样的问题吗？ I'm at a loss for what could be causing this... the only difference I can think of is that my machine is running Python 3.6 whereas the VM is running Python 2.7. 我对可能导致这种情况的原因不知所措...我能想到的唯一区别是，我的机器运行的是Python 3.6，而VM运行的是Python 2.7。 Could that account for the vast difference is training results? 能否说明培训结果的巨大差异？

1 个解决方案

I found a buggy interaction between Keras and the Estimator API in tensorflow 1.10 (current gcloud version), but not in >=1.11 (what I was using locally). 我在tensorflow 1.10（当前的gcloud版本）中发现Keras与Estimator API之间的互动存在错误，但在> = 1.11（我在本地使用的版本）中却找不到。

Not sure if it applies to you (do you use Keras+Estimator and tensorflow >=1.11 for local?) 不知道它是否适用于您（您是否将Keras + Estimator和tensorflow> = 1.11用于本地？）

I filed a bug report here: https://github.com/tensorflow/tensorflow/issues/24299 我在这里提交了错误报告： https : //github.com/tensorflow/tensorflow/issues/24299