简体   繁体   English

当会话已经在 gpu 上运行时使用 tensorflow

[英]Using tensorflow when a session is already running on the gpu

I am training a neural network with tensorflow 2 (gpu) on my local machine, I'd like to do some tensorflow code in parallel (just loading a model and saving it's graph).我正在本地机器上使用 tensorflow 2 (gpu) 训练神经网络,我想并行执行一些 tensorflow 代码(只需加载模型并保存它的图形)。

When loading the model I get a cuda error.加载模型时,出现 cuda 错误。 How can I use tensorflow 2 on cpu to load and save a model, when another instance of tensorflow is training on the gpu?当 tensorflow 的另一个实例正在 gpu 上训练时,如何在 cpu 上使用 tensorflow 2 加载和保存模型?

    132         self._config = config
    133         self._hyperparams['feature_extractor'] = self._get_feature_extractor(hyperparams['feature_extractor'])
--> 134         self._input_shape_tensor = tf.constant([input_shape[0], input_shape[1]])
    135         self._build(**self._hyperparams)
    136         # save parameter dict for serialization

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in constant(value, dtype, shape, name)
    225   """
    226   return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 227                         allow_broadcast=True)
    228 
    229 

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    233   ctx = context.context()
    234   if ctx.executing_eagerly():
--> 235     t = convert_to_eager_tensor(value, ctx, dtype)
    236     if shape is None:
    237       return t

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
     93     except AttributeError:
     94       dtype = dtypes.as_dtype(dtype).as_datatype_enum
---> 95   ctx.ensure_initialized()
     96   return ops.EagerTensor(value, ctx.device_name, dtype)
     97 

~/.anaconda3/envs/posenet2/lib/python3.7/site-packages/tensorflow_core/python/eager/context.py in ensure_initialized(self)
    490         if self._default_is_async == ASYNC:
    491           pywrap_tensorflow.TFE_ContextOptionsSetAsync(opts, True)
--> 492         self._context_handle = pywrap_tensorflow.TFE_NewContext(opts)
    493       finally:
    494         pywrap_tensorflow.TFE_DeleteContextOptions(opts)

InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

It took me a while to find this answer:我花了一段时间才找到这个答案:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf

Starting your code with those lines allows you to run your tf code on CPU (avoid using CUDA is the solution, obviously) while at the same time running a heavy GPU loaded training.用这些行开始你的代码允许你在 CPU 上运行你的 tf 代码(避免使用 CUDA 是解决方案,显然),同时运行大量 GPU 负载训练。

You are loading the model on GPU and since it is already being used for training, it is running out of memory .您正在 GPU 上加载模型,由于它已用于训练,因此out of memory You need to place the loading onto the CPU.您需要将负载放置到 CPU 上。 Try loading the model inside尝试在里面加载模型

with tf.device('/CPU:0'):

By default TensorFlow 2 allocates 90% of your GPU:0 memory at the startup.默认情况下,TensorFlow 2 在启动时分配 90% 的 GPU:0 内存。 If you set如果你设置

import tensorflow as tf
tf.config.experimental.set_memory_growth(tf.config.experimental.list_physical_devices('GPU')[0], True)

you'll be able to use your GPU for both your tasks (of course, if your GPU has enough memory for that).您将能够将 GPU 用于您的两项任务(当然,如果您的 GPU 有足够的内存)。
If you want more control on usage of GPU memory, you may create a virtual GPU with hard-coded video memory size:如果你想更多地控制 GPU 内存的使用,你可以创建一个具有硬编码视频内存大小的虚拟 GPU:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 2 GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)]) # limit in megabytes
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM