Tensorflow GPU 在与 Python 2.7 的 multiprocessing.Process 调用分叉的新进程中不可用

Question

系统规格

Ubuntu 18.04 服务器
GPU 已安装：Nvidia P1000
Cuda 版本：CUDA 版本 10.1.243
Tensorflow： tensorflow-gpu==1.15.

我注意到一个非常奇怪的错误，其中 GPU 仅对 Python 进程树的根进程中的 Tensorflow 可用。 如果我使用multiprocessing.Process()分叉一个进程，则 GPU 不再可用

示例代码：

import tensorflow as tf

import multiprocessing
import os
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


def main():
    logger.info("main(): tf.test.is_gpu_available(): %s", tf.test.is_gpu_available())
    process = multiprocessing.Process(target=run_tensorflow, args=())
    process.daemon = False
    process.start()


def run_tensorflow():
    logger.info("main(): tf.test.is_gpu_available(): %s", tf.test.is_gpu_available())

if __name__ == '__main__':
    main()

Output

2020-04-17 05:01:37.834131: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-17 05:01:37.855703: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-04-17 05:01:37.856170: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55bb442b0560 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-17 05:01:37.856184: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-17 05:01:37.857492: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-17 05:01:37.940480: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.940856: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55bb44337c50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-17 05:01:37.940872: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1660 SUPER, Compute Capability 7.5
2020-04-17 05:01:37.940974: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.941214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-04-17 05:01:37.941410: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-04-17 05:01:37.942234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-04-17 05:01:37.942998: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-04-17 05:01:37.943193: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-04-17 05:01:37.944143: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-04-17 05:01:37.944915: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-04-17 05:01:37.947293: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-17 05:01:37.947399: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.947708: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.947945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-04-17 05:01:37.947970: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-04-17 05:01:37.948442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-17 05:01:37.948452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-04-17 05:01:37.948457: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-04-17 05:01:37.948548: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.948813: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:01:37.949069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 5450 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:__main__:main(): tf.test.is_gpu_available(): True
2020-04-17 05:01:37.954340: E tensorflow/stream_executor/cuda/cuda_driver.cc:1247] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED: initialization error
2020-04-17 05:01:37.954384: E tensorflow/stream_executor/cuda/cuda_driver.cc:1247] could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED: initialization error
INFO:__main__:main(): tf.test.is_gpu_available(): False

重要的部分（我认为）是

INFO:__main__:main(): tf.test.is_gpu_available(): True

起初其次

INFO:__main__:run_tensorflow(): tf.test.is_gpu_available(): False

为什么我无法从子进程中获取 GPU 的句柄？

编辑：如果我等待导入 tensorflow 直到我分叉进程之后，看到我可以看到 GPU 可能会很有用

import multiprocessing
import os
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


def main():
    #logger.info("main(): tf.test.is_gpu_available(): %s", tf.test.is_gpu_available())
    process = multiprocessing.Process(target=run_tensorflow, args=())
    process.daemon = False
    process.start()


def run_tensorflow():
    import tensorflow as tf
    logger.info("main(): tf.test.is_gpu_available(): %s", tf.test.is_gpu_available())

if __name__ == '__main__':
    main()

2020-04-17 05:08:25.256372: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-04-17 05:08:25.279630: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3000000000 Hz
2020-04-17 05:08:25.280028: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5606fe0d0170 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-17 05:08:25.280047: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-04-17 05:08:25.281970: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-17 05:08:25.370354: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.370696: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5606fe157820 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-17 05:08:25.370713: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1660 SUPER, Compute Capability 7.5
2020-04-17 05:08:25.370815: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.371047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.785
pciBusID: 0000:01:00.0
2020-04-17 05:08:25.371225: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-04-17 05:08:25.372088: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-04-17 05:08:25.372890: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-04-17 05:08:25.373070: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-04-17 05:08:25.374055: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-04-17 05:08:25.374872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-04-17 05:08:25.377440: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-17 05:08:25.377538: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.377835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.378052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-04-17 05:08:25.378082: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-04-17 05:08:25.378552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-17 05:08:25.378564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2020-04-17 05:08:25.378569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2020-04-17 05:08:25.378638: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.378883: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-04-17 05:08:25.379117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 5450 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
INFO:__main__:main(): tf.test.is_gpu_available(): True

Answer 1

Tensorflow 默认使用 GPU memory 分配是贪婪的。 限制 GPU memory 增长描述了限制 GPU 分配的几种方法。 这应该允许多个 Tensorflow 程序共享一个 GPU。 但是，我不知道 Tensorflow 如何处理 fork() - 特别是在 GPU 已经处于活动状态时 - 并且很难相信它可以正常工作。 也许 fork() 在导入 Tensorflow （或至少使用它）之前？

Tensorflow GPU 在与 Python 2.7 的 multiprocessing.Process 调用分叉的新进程中不可用

问题描述

1 个解决方案

解决方案1
1 2020-04-17 06:01:28

Tensorflow GPU 在与 Python 2.7 的 multiprocessing.Process 调用分叉的新进程中不可用

问题描述

1 个解决方案

解决方案1 1 2020-04-17 06:01:28

解决方案1
1 2020-04-17 06:01:28