简体   繁体   English

GPU仅使用1-5%Tensorflow-gpu和Keras

[英]GPU only being used 1-5% Tensorflow-gpu and Keras

I just installed tensorflow for gpu and am using keras for my CNN. 我刚为gpu安装了tensorflow,并且正在为我的CNN使用keras。 During training my GPU is only used about 5%, but 5 out of 6gb of the vram is being used during the training. 在训练期间,我的GPU仅使用了大约5%,但在训练期间正在使用6gb中的5个。 Sometimes it glitches, prints 0.000000e+00 in the console and the gpu goes to 100% but then after a few seconds the training slows back down to 5%. 有时它会出现故障,在控制台中打印0.000000e + 00,然后gpu达到100%,但几秒后训练速度降低到5%。 My GPU is the Zotac gtx 1060 mini and I am using a Ryzen 5 1600x. 我的GPU是Zotac gtx 1060 mini,我使用的是Ryzen 5 1600x。

Epoch 1/25
 121/3860 [..............................] - ETA: 31:42 - loss: 3.0575 - acc: 0.0877 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00Epoch 2/25
 121/3860 [..............................] - ETA: 29:48 - loss: 3.0005 - acc: 0.0994 - val_loss: 0.0000e+00 - val_acc: 0.0000e+00Epoch 3/25
  36/3860 [..............................] - ETA: 24:47 - loss: 2.9863 - acc: 0.1024

Usually, we want the bottleneck to be on the GPU (hence 100% utilization). 通常,我们希望瓶颈在GPU上(因此100%利用率)。 If that's not happening, some other part of your code is taking a long time during each batch processing. 如果没有发生这种情况,代码的其他部分在每个批处理过程中需要很长时间。 It's hard to say what is it (specialy because you didn't add any code), but there's a few things you can try: 很难说它是什么(特别是因为你没有添加任何代码),但你可以尝试一些事情:

1. input data 输入数据

Make sure the input data for your network is always available. 确保网络的输入数据始终可用。 Reading images from disk takes a long time, so use multiple workers and the multiprocessing interface: 从磁盘读取图像需要很长时间,因此请使用多个workersmultiprocessing接口:

model.fit(..., use_multiprocessing=True, workers=8)

2. Force the model into the GPU 2.强制模型进入GPU

This is hardly the problem, because /gpu:0 is the default device, but it's worth to make sure you are executing the model in the intended device: 这几乎不是问题所在,因为/gpu:0是默认设备,但是确保在目标设备中执行模型是值得的:

with tf.device('/gpu:0'):
    x = Input(...)
    y = Conv2D(..)
    model = Model(x, y)

2. Check the model's size 2.检查型号的尺寸

If your batch size is large and allowed soft placement, parts of your network (which didn't fit in the GPU's memory) might be placed at the CPU. 如果批量大且允许软放置,则网络的一部分(不适合GPU的内存)可能会放在CPU上。 This considerably slows down the process. 这大大减缓了这个过程。

If soft placement is on, try to disable and check if a memory error is thrown: 如果启用了软放置,请尝试禁用并检查是否抛出了内存错误:

# make sure soft-placement is off
tf_config = tf.ConfigProto(allow_soft_placement=False)
tf_config.gpu_options.allow_growth = True
s = tf.Session(config=tf_config)
K.set_session(s)

with tf.device(...):
    ...

model.fit(...)

If that's the case, try to reduce the batch size until it fits and give you good GPU usage. 如果是这种情况,请尝试减小批量大小,直到它适合并为您提供良好的GPU使用率。 Then turn soft placement on again. 然后再次打开软位置。

Some directions you can try. 您可以尝试一些方向。

  1. Double check your input pipeline, make sure it is not the bottleneck for the performance. 仔细检查输入管道,确保它不是性能的瓶颈。
  2. In crease your batch number or layer width to make sure the GPU got enough data to consume. 增加批号或图层宽度,以确保GPU获得足够的数据。
  3. The most effective method is dump the profile json to have a look. 最有效的方法是转储配置文件json来查看。

My experience is in most of time the low utilization is because of lack of data for GPU to consume. 我的经验是在大多数时候,低利用率是因为缺乏GPU消耗的数据。

Some useful links * https://www.tensorflow.org/guide/performance/datasets * https://towardsdatascience.com/howto-profile-tensorflow-1a49fb18073d 一些有用的链接* https://www.tensorflow.org/guide/performance/datasets * https://towardsdatascience.com/howto-profile-tensorflow-1a49fb18073d

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM