简体   繁体   English

构建Keras项目以在GPU中实现可重现的结果

[英]Structuring a Keras project to achieve reproducible results in GPU

I am writing a tensorflow.Keras wrapper to perform ML experiments. 我正在写一个tensorflow.Keras包装器来执行ML实验。

I need my framework to be able to perform an experiment as specified in a configuration yaml file and run in parallel in a GPU. 我需要我的框架能够执行配置yaml文件中指定的实验并在GPU中并行运行。

Then I need a guarantee that if I ran the experiment again I would get if not the exact same results something reasonably close. 然后我需要保证,如果我再次运行实验,如果不是完全相同的结果,我会得到合理的接近。

To try to ensure this, my training script contains these lines at the beginning, following the guidelines in the official documentation : 为了确保这一点,我的培训脚本在开头按照官方文档中的指导原则包含这些行:

# Set up random seeds
random.seed(seed)
np.random.seed(seed)
tf.set_random_seed(seed)

This has proven to not be enough. 事实证明这还不够。

I ran the same configuration 4 times, and plotted the results: 我运行了相同的配置4次,并绘制了结果:

在此输入图像描述

As you can see, results vary a lot between runs. 如您所见,运行之间的结果差别很大。

How can I set up a training session in Keras to ensure I get reasonably similar results when training in a GPU? 如何在Keras中设置培训课程,以确保在GPU培训时获得相当类似的结果? Is this even possible? 这甚至可能吗?

The full training script can be found here . 完整的培训脚本可以在这里找到。

Some of my colleagues are using just pure TF , and their results seem far more consistent. 我的一些同事只使用纯TF ,他们的结果似乎更加一致。 What is more, they do not seem to be seeding any randomness except to ensure that the train and validation split is always the same. 更重要的是,除了确保列车和验证拆分始终相同之外,它们似乎没有播种任何随机性。

Keras + Tensorflow. Keras + Tensorflow。

Step 1, disable GPU. 第1步,禁用GPU。

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""

Step 2, seed those libraries which are included in your code, say "tensorflow, numpy, random". 第2步,播种代码中包含的那些库,说“tensorflow,numpy,random”。

import tensorflow as tf
import numpy as np
import random as rn

sd = 1 # Here sd means seed.
np.random.seed(sd)
rn.seed(sd)
os.environ['PYTHONHASHSEED']=str(sd)

from keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
tf.set_random_seed(sd)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)

Make sure these two pieces of code are included at the start of your code, then the result will be reproducible. 确保在代码的开头包含这两段代码,然后结果将是可重现的。

Try adding seed parameters to weights/biases initializers. 尝试将种子参数添加到权重/偏差初始值设定项。 Just to add more specifics to Alexander Ejbekov's comment. 只是为Alexander Ejbekov的评论添加更多细节。

Tensorflow has two random seeds graph level and op level. Tensorflow有两个随机种子图级别和操作级别。 If you're using more than one graph, you need to specify seed in every one. 如果您使用多个图表,则需要在每个图表中指定种子。 You can override graph level seed with op level, by setting seed parameter within function. 您可以通过在函数内设置种子参数来覆盖op级别的图级别种子。 And you can make two functions even from different graphs output same value if same seed is set. 如果设置相同的种子,即使从不同的图形输出相同的值,您也可以创建两个函数。 Consider this example: 考虑这个例子:

g1 = tf.Graph()
with g1.as_default():
    tf.set_random_seed(1)
    a = tf.get_variable('a', shape=(1,), initializer=tf.keras.initializers.glorot_normal())
    b = tf.get_variable('b', shape=(1,), initializer=tf.keras.initializers.glorot_normal(seed=2))
with tf.Session(graph=g1) as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(a)) 
    print(sess.run(b))
g2 = tf.Graph()
with g2.as_default():
    a1 = tf.get_variable('a1', shape=(1,), initializer=tf.keras.initializers.glorot_normal(seed=1))

with tf.Session(graph=g2) as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(a1))

In this example, output of a is the same as a1 , but b is different. 在这个例子中,输出a是相同的a1 ,但b是不同的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM