[英]Bug when using TensorFlow-GPU + Python multiprocessing?
I have noticed a strange behavior when I use TensorFlow-GPU + Python multiprocessing. 使用TensorFlow-GPU + Python多处理时,我注意到了一个奇怪的行为。
I have implemented a DCGAN with some customizations and my own dataset. 我已经通过一些自定义和自己的数据集实现了DCGAN 。 Since I am conditioning the DCGAN to certain features, I have training data and also test data for evaluation. 因为我将DCGAN调整为某些功能,所以我拥有训练数据和测试数据以进行评估。
Due to the size of my datasets, I have written data loaders that run concurrently and preload into a queue using Python's multiprocessing . 由于数据集的大小,我编写了可以同时运行并使用Python的multiprocessing预加载到队列中的数据加载器。
The structure of the code roughly looks like this: 代码的结构大致如下所示:
class ConcurrentLoader:
def __init__(self, dataset):
...
class DCGAN
...
net = DCGAN()
training_data = ConcurrentLoader(path_to_training_data)
test_data = ConcurrentLoader(path_to_test_data)
This code runs fine on TensorFlow-CPU and on TensorFlow-GPU <= 1.3.0 using CUDA 8.0, but when I run the exact same code with TensorFlow-GPU 1.4.1 and CUDA 9 (latest releases of TF & CUDA as of Dec 2017) it crashes: 这段代码在使用CUDA 8.0的TensorFlow-CPU 和 TensorFlow-GPU <= 1.3.0上运行良好,但是当我使用TensorFlow-GPU 1.4.1和CUDA 9运行完全相同的代码时2017)崩溃:
2017-12-20 01:15:39.524761: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-12-20 01:15:39.527795: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-12-20 01:15:39.529548: E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-12-20 01:15:39.535341: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2017-12-20 01:15:39.535383: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-12-20 01:15:39.535397: F tensorflow/core/kernels/conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
[1] 32299 abort (core dumped) python dcgan.py --mode train --save_path ~/tf_run_dir/test --epochs 1
What really confuses me is that if I just remove test_data
the error does not occur. 真正令我困惑的是,如果我只是删除test_data
,则不会发生该错误。 Thus, for some strange reason, TensorFlow-GPU 1.4.1 & CUDA 9 work with just a single ConcurrentLoader
, but crash when multiple loaders are initialized . 因此,出于某些奇怪的原因,TensorFlow-GPU 1.4.1和CUDA 9只能使用一个ConcurrentLoader
,但是在初始化多个加载器时会崩溃。
Even more interesting is that (after the exception) I have to manually shut down the python processes, because the GPU's VRAM, the system's RAM and even the python processes stay alive after the script crashes. 更为有趣的是,(例外之后)我必须手动关闭python进程,因为在脚本崩溃后,GPU的VRAM,系统的RAM甚至python进程仍然有效。
Furthermore, it has to have some weird connection to Python's multiprocessing
module, because when I implement the same model in Keras (using TF backend!) the code also runs just fine, with 2 concurrent loaders. 此外,它必须与Python的multiprocessing
模块建立某种怪异的联系,因为当我在Keras中实现相同的模型(使用TF后端!)时,代码也运行得很好,并带有2个并发加载器。 I guess Keras is somehow creating an layer of abstraction in between that keeps TF from crashing. 我猜Keras正在某种程度上在两者之间创建一个抽象层,以防止TF崩溃。
Where could I possibly have screwed up with the multiprocessing
module that it causes crashes like this one? 我在哪里可能弄坏了导致这种崩溃的multiprocessing
模块?
These are the parts of the code that use multiprocessing
inside the ConcurrentLoader
: 这些是在ConcurrentLoader
内部使用multiprocessing
的代码部分:
def __init__(self, dataset):
...
self._q = mp.Queue(64)
self._file_cycler = cycle(img_files)
self._worker = mp.Process(target=self._worker_func, daemon=True)
self._worker.start()
def _worker_func(self):
while True:
... # gets next filepaths from self._file_cycler
buffer = list()
for im_path in paths:
... # uses OpenCV to load each image & puts it into the buffer
self._q.put(np.array(buffer).astype(np.float32))
...and this is it. ...就是这样。
Where have I written "unstable" or "non-pythonic" multiprocessing
code? 我在哪里编写了“不稳定”或“非Python”的multiprocessing
代码? I thought daemon=True
should ensure that every process gets killed as soon as the main process dies? 我认为daemon=True
应该确保主进程死后立即杀死每个进程? Unfortunately, this is not the case for this specific error. 不幸的是,此特定错误并非如此。
Did I misuse the default multiprocessing.Process
or multiprocessing.Queue
here? 我在这里滥用默认的multiprocessing.Process
或multiprocessing.Queue
吗? I thought simply writing a class where I store batches of images inside a Queue and make it accessible through methods / instance variables should be just fine. 我认为只需编写一个类,即可在Queue中存储大量图像,并使其可以通过方法/实例变量进行访问就可以了。
I am coming with the same error when trying to use tensorflow and multiprocessing 尝试使用tensorflow和多处理时出现相同的错误
E tensorflow/stream_executor/cuda/cuda_blas.cc:366] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
but in different environment tf1.4 + cuda 8.0 + cudnn 6.0. 但在不同的环境下,tf1.4 + cuda 8.0 + cudnn 6.0。 matrixMulCUBLAS in sample codes works fine. 示例代码中的matrixMulCUBLAS工作正常。 I wonder the correct solution too! 我也想知道正确的解决方案! And the reference failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED on a AWS p2.xlarge instance did not work for me. 并且该引用未能创建cublas句柄:AWS p2.xlarge实例上的CUBLAS_STATUS_NOT_INITIALIZED对我不起作用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.