简体   繁体   English

如何阻止 CUDA 为每个训练 keras model 的子进程重新初始化?

[英]How to stop CUDA from re-initializing for every subprocess which trains a keras model?

I am using CUDA/CUDNN to train multiple tensorflow keras models on my GPU (for an evolutionary algorithm attempting to optimize hyperparameters).我正在使用 CUDA/CUDNN 在我的 GPU 上训练多个 tensorflow keras 模型(用于尝试优化超参数的进化算法)。 Initially, the program would crash with an Out of Memory error after a couple generations.最初,程序会在几代之后崩溃,并出现 Out of Memory 错误。 Eventually, I found that using a new sub-process for every model would clear the GPU memory automatically.最终,我发现为每个 model 使用一个新的子进程会自动清除 GPU memory。

However, each process seems to reinitialize CUDA (loading dynamic libraries from the.dll files), which is incredibly time-consuming.但是,每个进程似乎都在重新初始化 CUDA(从 .dll 文件加载动态库),这非常耗时。 Is there any method to avoid this?有什么方法可以避免这种情况吗?

Code is pasted below.代码粘贴在下面。 The function "fitness_wrapper" is called for each indiv idual.为每个人调用indiv “fitness_wrapper”。

def fitness_wrapper(indiv):
    fit = multi.processing.Value('d', 0.0)
    if __name__ == '__main__':
        process = multiprocessing.Process(target=fitness, args=(indiv, fit))
        process.start()
        process.join()
    return (fit.value,)


def fitness(indiv, fit):
    model = tf.keras.Sequential.from_config(indiv['architecture'])
    optimizer_dict = indiv['optimizer']
    opt = tf.keras.optimizers.Adam(learning_rate=optimizer_dict['lr'], beta_1=optimizer_dict['b1'],
                                   beta_2=optimizer_dict['b2'],
                                   epsilon=optimizer_dict['epsilon'])
    model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
    model.fit(data_split[0], data_split[2], batch_size=32, epochs=5)
    fit = model.evaluate(data_split[1], data_split[3])[1]

Turns out the solution is to just use tensorflow.backend.clear_session() after each model (rather than the subprocesses).原来解决方案是在每个 model (而不是子进程)之后只使用 tensorflow.backend.clear_session() 。 I had tried this before and it didn't work, but for some reason this time it fixed everything.我以前试过这个,但它没有用,但由于某种原因,这次它修复了所有问题。

You should also delete the model and call reset_default_graph(), apparently.显然,您还应该删除 model 并调用 reset_default_graph()。

def fitness(indiv, fit):
    model = tf.keras.Sequential.from_config(indiv['architecture'])
    optimizer_dict = indiv['optimizer']
    opt = tf.keras.optimizers.Adam(learning_rate=optimizer_dict['lr'], beta_1=optimizer_dict['b1'],
                                   beta_2=optimizer_dict['b2'],
                                   epsilon=optimizer_dict['epsilon'])
    model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
    model.fit(data_split[0], data_split[2], batch_size=32, epochs=5)
    fit = model.evaluate(data_split[1], data_split[3])[1]

    del model
    tf.keras.backend.clear_session()
    tf.compat.v1.reset_default_graph()

    return fit

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 重新初始化类的父级 - Re-initializing parent of a class 我应该在 K 折交叉验证循环之间重新初始化整个模型吗? 我该怎么做? - Should I be re-initializing the whole model between loops of K-fold Cross-validation? And how do I do that? Tensorflow:它如何训练模型? - Tensorflow: how it trains the model? 如何使用Keras构建一个resnet来训练和预测主类中的子类? - How to build a resnet with Keras that trains and predicts the subclass from the main class? A Keras model 训练良好,但预测的值相同 - A Keras model trains well, but predicts the same value 在Tensorflow中重新初始化迭代器后改组数据集 - Shuffling the dataset after re-initializing the iterator in tensorflow Azure Function 应用程序 - Python - 无需重新初始化的单连接 - Azure Function App - Python - Single connection without re-initializing 如何使用Keras创建从表格数据进行训练的神经网络? - How can I create a neural network with Keras that trains from tabular data? 重新初始化数据集后,损失会回升到起始值 - Loss goes up back to starting value after re-initializing dataset GUnicorn + CUDA:无法在分叉子进程中重新初始化 CUDA - GUnicorn + CUDA: Cannot re-initialize CUDA in forked subprocess
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM