仅在本地 GPU 上的低 CNN 精度

Question

For some reason, All my convolutional neural networks have VERY poor accuracy.出于某种原因，我所有的卷积神经网络的准确性都非常差。 regardless of the model compiled.不管 model 编译。 This is in jupyter notebook on a local machine using an RTX 3060 TI GPU with CUDA 11.1.这是在本地机器上使用 RTX 3060 TI GPU 和 CUDA 11.1 的 jupyter 笔记本。

When the I use Google Colab, all my codes work fine with high accuracy.当我使用 Google Colab 时，我的所有代码都能以高精度正常工作。 It should be noted this only applied to convolutional neural networks.应该注意的是，这仅适用于卷积神经网络。 neural networks with only densely connected layers work fine.只有密集连接层的神经网络可以正常工作。

Some details:一些细节：

Tensor Flow Version: 2.1.0
Keras Version: 2.2.4-tf

Python 3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)]
Pandas 1.2.0
Scikit-Learn 0.24.0
GPU is available

This is an example code (binary classifcation 50/50 split):这是一个示例代码（二进制分类 50/50 拆分）：

from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras import optimizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer = optimizers.RMSprop(lr=1e-6), #decrease learning rate
             metrics=['accuracy'])
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_dir,target_size=(150, 150),batch_size=20,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(validation_dir,target_size=(150, 150),
batch_size=20, class_mode='binary')
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=30,
validation_data=validation_generator,
validation_steps=50)

results结果

WARNING:tensorflow:From <ipython-input-8-f61a1535c537>:6: Model.fit_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
Please use Model.fit, which supports generators.
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to  
  ['...']
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to  
  ['...']
Train for 100 steps, validate for 50 steps
Epoch 1/30
100/100 [==============================] - 1158s 12s/step - loss: 0.7020 - accuracy: 0.4945 - val_loss: 0.8541 - val_accuracy: 0.4980
Epoch 2/30
100/100 [==============================] - 5s 47ms/step - loss: 0.6987 - accuracy: 0.5105 - val_loss: 0.6930 - val_accuracy: 0.5000 2s - loss: 0.6931 - accura - ETA: 2s - loss: 0.6931 - accura - ETA: 1s - loss: 0.6931  - ETA: 1s - loss: 0.6926 - accuracy: 0. - ETA: 1s - loss: 0.6939 - accuracy - ETA: 0s - los
Epoch 3/30
100/100 [==============================] - 5s 47ms/step - loss: 0.7000 - accuracy: 0.4985 - val_loss: 0.8449 - val_accuracy: 0.5000s - loss: 0.6983 - accuracy - ETA: 0s - loss:
Epoch 4/30
100/100 [==============================] - 5s 47ms/step - loss: 0.6967 - accuracy: 0.4975 - val_loss: 0.7162 - val_accuracy: 0.4800
Epoch 5/30
100/100 [==============================] - 5s 47ms/step - loss: 0.6931 - accuracy: 0.4945 - val_loss: 0.8477 - val_accuracy: 0.49900.6931 - accura - ETA: 0s - loss: 0.6931 - ac - ETA: 0s - loss: 0.6931 - 
Epoch 6/30
100/100 [==============================] - 5s 47ms/step - loss: 0.6931 - accuracy: 0.4895 - val_loss: 0.7846 - val_accuracy: 0.5000: 0.6931 - ac
Epoch 7/30
100/100 [==============================] - 5s 47ms/step - loss: 0.6933 - accuracy: 0.4860 - val_loss: 0.7468 - val_accuracy: 0.5000- ETA: 1s - loss: 0.6 - ETA: 0s - loss: 0.6938

Answer 1

I do not use Colab so not sure why it trains well on there, perhaps it uses a different version of tensorflow.我不使用 Colab，所以不确定为什么它在那里训练得很好，也许它使用了不同版本的 tensorflow。 In model.fit you set the values for steps_per_epoch and validation_steps.在 model.fit 中，您设置 steps_per_epoch 和 validation_steps 的值。 I have found it best to leave these as none.我发现最好将这些保留为无。 Model.fit will automatically determine the correct values. Model.fit 将自动确定正确的值。 Also model.fit_generator is being depreciated so use model.fit instead.此外 model.fit_generator 正在折旧，因此请改用 model.fit 。 Your learning rate is rather small I would try something like.001.您的学习率很小，我会尝试类似.001。 I ran you code on a data set of images with two classes, the model did train but was slow to converge with lr=1e-6.我在具有两个类的图像数据集上运行了您的代码，model 确实进行了训练，但在 lr=1e-6 时收敛速度很慢。 With the learning rate at.001 it converged much faster getting the same accuracy(about 81%) in 3 epochs that was achieved in 30 epochs with the small learning rate.在 0.001 的学习率下，它收敛得更快，在 3 个 epoch 中获得了相同的准确度（大约 81%），而在 30 个 epoch 中达到了小学习率。 I am using tensorflow 2.1 and had no problems getting you code to work with the exceptions mentioned above.我正在使用 tensorflow 2.1 并且没有问题让您的代码处理上述异常。 I think the problem may be that you are using CUDA 11.1.我认为问题可能是您使用的是 CUDA 11.1。 I believe for tensorflow 2.1 you should be using 10.1.我相信对于 tensorflow 2.1，您应该使用 10.1。 You should have cudatoolkit 10.1.243 and cudnn 7.65 installed您应该安装了 cudatoolkit 10.1.243 和 cudnn 7.65

Answer 2

For some reason running this code fixes everything.出于某种原因，运行此代码可以解决所有问题。 Could someone please care to explain why?有人可以解释一下原因吗？ thanks谢谢

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)

    except RuntimeError as e:
        print(e)

仅在本地 GPU 上的低 CNN 精度

问题描述

2 个解决方案

解决方案1
0 2021-01-13 04:43:16

解决方案2
0 已采纳 2021-01-13 11:14:47

仅在本地 GPU 上的低 CNN 精度

问题描述

2 个解决方案

解决方案1 0 2021-01-13 04:43:16

解决方案2 0 已采纳 2021-01-13 11:14:47

解决方案1
0 2021-01-13 04:43:16

解决方案2
0 已采纳 2021-01-13 11:14:47