使用TensorFlow-GPU + Python多处理时出现错误？

Question

I have noticed a strange behavior when I use TensorFlow-GPU + Python multiprocessing. 使用TensorFlow-GPU + Python多处理时，我注意到了一个奇怪的行为。

I have implemented a DCGAN with some customizations and my own dataset. 我已经通过一些自定义和自己的数据集实现了DCGAN 。 Since I am conditioning the DCGAN to certain features, I have training data and also test data for evaluation. 因为我将DCGAN调整为某些功能，所以我拥有训练数据和测试数据以进行评估。

Due to the size of my datasets, I have written data loaders that run concurrently and preload into a queue using Python's multiprocessing . 由于数据集的大小，我编写了可以同时运行并使用Python的multiprocessing预加载到队列中的数据加载器。

The structure of the code roughly looks like this: 代码的结构大致如下所示：

class ConcurrentLoader:
    def __init__(self, dataset):
        ...

class DCGAN
     ...

net = DCGAN()
training_data = ConcurrentLoader(path_to_training_data)
test_data = ConcurrentLoader(path_to_test_data)

This code runs fine on TensorFlow-CPU and on TensorFlow-GPU <= 1.3.0 using CUDA 8.0, but when I run the exact same code with TensorFlow-GPU 1.4.1 and CUDA 9 (latest releases of TF & CUDA as of Dec 2017) it crashes: 这段代码在使用CUDA 8.0的TensorFlow-CPU 和 TensorFlow-GPU <= 1.3.0上运行良好，但是当我使用TensorFlow-GPU 1.4.1和CUDA 9运行完全相同的代码时2017）崩溃：

2017-12-20 01:15:39.524761: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-12-20 01:15:39.527795: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-12-20 01:15:39.529548: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-12-20 01:15:39.535341: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-12-20 01:15:39.535383: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-12-20 01:15:39.535397: F tensorflow/core/kernels/conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
[1]    32299 abort (core dumped)  python dcgan.py --mode train --save_path ~/tf_run_dir/test --epochs 1

What really confuses me is that if I just remove test_data the error does not occur. 真正令我困惑的是，如果我只是删除test_data ，则不会发生该错误。 Thus, for some strange reason, TensorFlow-GPU 1.4.1 & CUDA 9 work with just a single ConcurrentLoader , but crash when multiple loaders are initialized . 因此，出于某些奇怪的原因，TensorFlow-GPU 1.4.1和CUDA 9只能使用一个ConcurrentLoader ，但是在初始化多个加载器时会崩溃。

Even more interesting is that (after the exception) I have to manually shut down the python processes, because the GPU's VRAM, the system's RAM and even the python processes stay alive after the script crashes. 更为有趣的是，（例外之后）我必须手动关闭python进程，因为在脚本崩溃后，GPU的VRAM，系统的RAM甚至python进程仍然有效。

Furthermore, it has to have some weird connection to Python's multiprocessing module, because when I implement the same model in Keras (using TF backend!) the code also runs just fine, with 2 concurrent loaders. 此外，它必须与Python的multiprocessing模块建立某种怪异的联系，因为当我在Keras中实现相同的模型（使用TF后端！）时，代码也运行得很好，并带有2个并发加载器。 I guess Keras is somehow creating an layer of abstraction in between that keeps TF from crashing. 我猜Keras正在某种程度上在两者之间创建一个抽象层，以防止TF崩溃。

Where could I possibly have screwed up with the multiprocessing module that it causes crashes like this one? 我在哪里可能弄坏了导致这种崩溃的multiprocessing模块？

These are the parts of the code that use multiprocessing inside the ConcurrentLoader : 这些是在ConcurrentLoader内部使用multiprocessing的代码部分：

def __init__(self, dataset):
    ...
    self._q = mp.Queue(64)
    self._file_cycler = cycle(img_files)
    self._worker = mp.Process(target=self._worker_func, daemon=True)
    self._worker.start()

def _worker_func(self):
    while True:
        ... # gets next filepaths from self._file_cycler
        buffer = list()
        for im_path in paths:
            ... # uses OpenCV to load each image & puts it into the buffer
        self._q.put(np.array(buffer).astype(np.float32))

...and this is it. ...就是这样。

Where have I written "unstable" or "non-pythonic" multiprocessing code? 我在哪里编写了“不稳定”或“非Python”的multiprocessing代码？ I thought daemon=True should ensure that every process gets killed as soon as the main process dies? 我认为daemon=True应该确保主进程死后立即杀死每个进程？ Unfortunately, this is not the case for this specific error. 不幸的是，此特定错误并非如此。

Did I misuse the default multiprocessing.Process or multiprocessing.Queue here? 我在这里滥用默认的multiprocessing.Process或multiprocessing.Queue吗？ I thought simply writing a class where I store batches of images inside a Queue and make it accessible through methods / instance variables should be just fine. 我认为只需编写一个类，即可在Queue中存储大量图像，并使其可以通过方法/实例变量进行访问就可以了。

Answer 1

I am coming with the same error when trying to use tensorflow and multiprocessing 尝试使用tensorflow和多处理时出现相同的错误

E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED

but in different environment tf1.4 + cuda 8.0 + cudnn 6.0. 但在不同的环境下，tf1.4 + cuda 8.0 + cudnn 6.0。 matrixMulCUBLAS in sample codes works fine. 示例代码中的matrixMulCUBLAS工作正常。 I wonder the correct solution too! 我也想知道正确的解决方案！ And the reference failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED on a AWS p2.xlarge instance did not work for me. 并且该引用未能创建cublas句柄：AWS p2.xlarge实例上的CUBLAS_STATUS_NOT_INITIALIZED对我不起作用。

使用TensorFlow-GPU + Python多处理时出现错误？

问题描述

1 个解决方案

解决方案1
1 2017-12-20 06:38:15

使用TensorFlow-GPU + Python多处理时出现错误？

问题描述

1 个解决方案

解决方案1 1 2017-12-20 06:38:15

解决方案1
1 2017-12-20 06:38:15