如何确保 Keras 使用带有 tensorflow 后端的 GPU？

Question

I've created virtual notebook on Paperspace cloud infrastructure with Tensorflow GPU P5000 virtual instance on the backend.我在 Paperspace 云基础架构上创建了虚拟笔记本，后端使用 Tensorflow GPU P5000 虚拟实例。 When i am starting to train my network, it woks 2x SLOWER than on my MacBook Pro with pure CPU runtime engine.当我开始训练我的网络时，它比使用纯 CPU 运行时引擎的 MacBook Pro 慢 2 倍。 How could i ensure that Keras NN is using GPU instead of CPU during training process?我如何确保 Keras NN 在训练过程中使用 GPU 而不是 CPU？

Please find my code below:请在下面找到我的代码：

from tensorflow.contrib.keras.api.keras.models import Sequential
from tensorflow.contrib.keras.api.keras.layers import Dense
from tensorflow.contrib.keras.api.keras.layers import Dropout
from tensorflow.contrib.keras.api.keras import utils as np_utils
import numpy as np
import pandas as pd

# Read data
pddata= pd.read_csv('data/data.csv', delimiter=';')

# Helper function (prepare & test data)
def split_to_train_test (data):
    trainLenght = len(data) - len(data)//10

    trainData = data.loc[:trainLenght].sample(frac=1).reset_index(drop=True)
    testData = data.loc[trainLenght+1:].sample(frac=1).reset_index(drop=True)

    trainLabels = trainData.loc[:,"Label"].as_matrix()
    testLabels = testData.loc[:,"Label"].as_matrix()

    trainData = trainData.loc[:,"Feature 0":].as_matrix()
    testData  = testData.loc[:,"Feature 0":].as_matrix()

    return (trainData, testData, trainLabels, testLabels)

# prepare train & test data
(X_train, X_test, y_train, y_test) = split_to_train_test (pddata)

# Convert labels to one-hot notation
Y_train = np_utils.to_categorical(y_train, 3)
Y_test  = np_utils.to_categorical(y_test, 3)

# Define model in Keras
def create_model(init):
    model = Sequential()
    model.add(Dense(101, input_shape=(101,), kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(3, kernel_initializer=init, activation='softmax'))
    return model

# Train the model
uniform_model = create_model("glorot_normal")
uniform_model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
uniform_model.fit(X_train, Y_train, batch_size=1, epochs=300, verbose=1, validation_data=(X_test, Y_test))

Answer 1

You need to run your network with log_device_placement = True set in the TensorFlow session (the line before the last in the sample code below.) Interestingly enough, if you set that in a session, it will still apply when Keras does the fitting.您需要在 TensorFlow 会话中设置log_device_placement = True来运行您的网络（下面示例代码中最后一行之前的行）。有趣的是，如果您在会话中设置它，当 Keras 进行拟合时它仍然适用。 So this code below (tested) does output the placement for each tensor.所以下面的这段代码（经过测试）确实输出了每个张量的位置。 Please note, I've short-circuited the data reading because your data wan't available, so I'm just running the network with random data.请注意，由于您的数据不可用，我已将数据读取短路，所以我只是使用随机数据运行网络。 The code this way is self-contained and runnable by anyone.这种方式的代码是自包含的，任何人都可以运行。 Another note: if you run this from Jupyter Notebook, the output of the log_device_placement will go to the terminal where Jupyter Notebook was started , not the notebook cell's output.另注：如果你从Jupyter笔记本电脑上运行这一点，输出log_device_placement将转至Jupyter笔记本电脑开始，而不是笔记本电脑电池的输出端子。

from tensorflow.contrib.keras.api.keras.models import Sequential
from tensorflow.contrib.keras.api.keras.layers import Dense
from tensorflow.contrib.keras.api.keras.layers import Dropout
from tensorflow.contrib.keras.api.keras import utils as np_utils
import numpy as np
import pandas as pd
import tensorflow as tf

# Read data
#pddata=pd.read_csv('data/data.csv', delimiter=';')
pddata = "foobar"

# Helper function (prepare & test data)
def split_to_train_test (data):

    return (
        np.random.uniform( size = ( 100, 101 ) ),
        np.random.uniform( size = ( 100, 101 ) ),
        np.random.randint( 0, size = ( 100 ), high = 3 ),
        np.random.randint( 0, size = ( 100 ), high = 3 )
    )

    trainLenght = len(data) - len(data)//10

    trainData = data.loc[:trainLenght].sample(frac=1).reset_index(drop=True)
    testData = data.loc[trainLenght+1:].sample(frac=1).reset_index(drop=True)

    trainLabels = trainData.loc[:,"Label"].as_matrix()
    testLabels = testData.loc[:,"Label"].as_matrix()

    trainData = trainData.loc[:,"Feature 0":].as_matrix()
    testData  = testData.loc[:,"Feature 0":].as_matrix()

    return (trainData, testData, trainLabels, testLabels)

# prepare train & test data
(X_train, X_test, y_train, y_test) = split_to_train_test (pddata)

# Convert labels to one-hot notation
Y_train = np_utils.to_categorical(y_train, 3)
Y_test  = np_utils.to_categorical(y_test, 3)

# Define model in Keras
def create_model(init):
    model = Sequential()
    model.add(Dense(101, input_shape=(101,), kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(101, kernel_initializer=init, activation='tanh'))
    model.add(Dense(3, kernel_initializer=init, activation='softmax'))
    return model

# Train the model
uniform_model = create_model("glorot_normal")
uniform_model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
with tf.Session( config = tf.ConfigProto( log_device_placement = True ) ):
    uniform_model.fit(X_train, Y_train, batch_size=1, epochs=300, verbose=1, validation_data=(X_test, Y_test))

Terminal output (partial, it was way too long):终端输出（部分，太长了）：

... ...
VarIsInitializedOp_13: (VarIsInitializedOp): /job:localhost/replica:0/task:0/device:GPU:0 VarIsInitializedOp_13: (VarIsInitializedOp): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-21 21:54:33.485870: I tensorflow/core/common_runtime/placer.cc:884] 2018-04-21 21:54:33.485870：我张量流/核心/common_runtime/placer.cc:884]
VarIsInitializedOp_13: (VarIsInitializedOp)/job:localhost/replica:0/task:0/device:GPU:0 VarIsInitializedOp_13: (VarIsInitializedOp)/job:localhost/replica:0/task:0/device:GPU:0
training/SGD/mul_18/ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0 training/SGD/mul_18/ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-21 21:54:33.485895: I tensorflow/core/common_runtime/placer.cc:884] 2018-04-21 21:54:33.485895：我张量流/核心/common_runtime/placer.cc:884]
training/SGD/mul_18/ReadVariableOp: (ReadVariableOp)/job:localhost/replica:0/task:0/device:GPU:0 training/SGD/mul_18/ReadVariableOp: (ReadVariableOp)/job:localhost/replica:0/task:0/device:GPU:0
training/SGD/Variable_9/Read/ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0 training/SGD/Variable_9/Read/ReadVariableOp: (ReadVariableOp): /job:localhost/replica:0/task:0/device:GPU:0
2018-04-21 21:54:33.485903: I tensorflow/core/common_runtime/placer.cc:884] 2018-04-21 21:54:33.485903：我张量流/核心/common_runtime/placer.cc:884]
training/SGD/Variable_9/Read/ReadVariableOp: (ReadVariableOp)/job:localhost/replica:0/task:0/device:GPU:0 training/SGD/Variable_9/Read/ReadVariableOp: (ReadVariableOp)/job:localhost/replica:0/task:0/device:GPU:0
... ...

Note the GPU:0 at the end of many lines.注意多行末尾的GPU:0 。

Tensorflow manual's relevant page: Using GPU: Logging Device Placement . Tensorflow 手册的相关页面：使用 GPU：记录设备放置。

Answer 2

Put this near the top of your jupyter notebook.把它放在你的 jupyter notebook 顶部附近。 Comment out what you don't need.把你不需要的注释掉。

# confirm TensorFlow sees the GPU
from tensorflow.python.client import device_lib
assert 'GPU' in str(device_lib.list_local_devices())

# confirm Keras sees the GPU (for TensorFlow 1.X + Keras)
from keras import backend
assert len(backend.tensorflow_backend._get_available_gpus()) > 0

# confirm PyTorch sees the GPU
from torch import cuda
assert cuda.is_available()
assert cuda.device_count() > 0
print(cuda.get_device_name(cuda.current_device()))

NOTE: With the release of TensorFlow 2.0, Keras is now included as part of the TF API.注意：随着 TensorFlow 2.0 的发布，Keras 现在包含在 TF API 中。

Originally answerwed here .原来在这里回答。

Answer 3

Considering keras is a built-in of tensorflow since version 2.0:考虑到 keras 是 2.0 版以来的 tensorflow 内置：

import tensorflow as tf
tf.test.is_built_with_cuda()  
tf.test.is_gpu_available(cuda_only = True)

NOTE: the latter method may take several minutes to run.注意：后一种方法可能需要几分钟才能运行。

如何确保 Keras 使用带有 tensorflow 后端的 GPU？

问题描述

3 个解决方案

解决方案1
6 已采纳 2018-04-21 20:09:02

解决方案2
1 2019-11-17 02:28:39

解决方案3
0 2020-08-11 14:32:35

如何确保 Keras 使用带有 tensorflow 后端的 GPU？

问题描述

3 个解决方案

解决方案1 6 已采纳 2018-04-21 20:09:02

解决方案2 1 2019-11-17 02:28:39

解决方案3 0 2020-08-11 14:32:35

解决方案1
6 已采纳 2018-04-21 20:09:02

解决方案2
1 2019-11-17 02:28:39

解决方案3
0 2020-08-11 14:32:35