简体   繁体   English

即使使用keras fit_generator方法,GPU的性能仍然很慢

[英]The performance of GPU still slow even by keras fit_generator method

I have a large dataset 5GB that I want to use for training a neural network model designed using Keras. 我有一个5GB的大型数据集,用于训练使用Keras设计的神经网络模型。 Although I am using Nvidia Tesla P100 GPU, the training is really slow (each epoch takes ~ 60-70s) (I choose the batch size=10000 ). 尽管我使用的是Nvidia Tesla P100 GPU,但训练速度确实很慢(每个时期需要60-70 s)(我选择batch size=10000 )。 After reading and searching, I found out that I can improve the training speed by using keras fit_generator instead of the typical fit . 阅读和搜索后,我发现可以通过使用keras fit_generator代替典型的fit来提高训练速度。 To do so, I coded the following: 为此,我编写了以下代码:

from __future__ import print_function
import numpy as np
from keras import Sequential
from keras.layers import Dense
import keras
from sklearn.model_selection import train_test_split


def generator(C, r, batch_size):
    samples_per_epoch = C.shape[0]
    number_of_batches = samples_per_epoch / batch_size
    counter = 0

    while 1:
        X_batch = np.array(C[batch_size * counter:batch_size * (counter + 1)])
        y_batch = np.array(r[batch_size * counter:batch_size * (counter + 1)])
        counter += 1
        yield X_batch, y_batch

        # restart counter to yeild data in the next epoch as well
        if counter >= number_of_batches:
            counter = 0


if __name__ == "__main__":
    X, y = readDatasetFromFile()
    X_tr, X_ts, y_tr, y_ts = train_test_split(X, y, test_size=.2)

    model = Sequential()
    model.add(Dense(16, input_dim=X.shape[1]))
    model.add(keras.layers.advanced_activations.PReLU())
    model.add(Dense(16))
    model.add(keras.layers.advanced_activations.PReLU())
    model.add(Dense(16))
    model.add(keras.layers.advanced_activations.PReLU())
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

    batch_size = 1000
    model.fit_generator(generator(X_tr, y_tr, batch_size), epochs=200, steps_per_epoch=X.shape[0]/ batch_size,
                        validation_data=generator(X_ts, y_ts, batch_size * 2),
                        validation_steps=X.shape[0] / batch_size * 2, verbose=2, use_multiprocessing=True)

    loss, accuracy = model.evaluate(X_ts, y_ts, verbose=0)
    print(loss, accuracy)

After running with fit_generator , the training time improved a little bit but it is still slow (each epoch now takes ~ 40-50s). 使用fit_generator运行后,训练时间有所改善,但仍然很慢(每个时期现在需要40至50秒)。 When running nvidia-smi in the terminal, I found out that GPU utilization is ~15% only which makes me wonder if my code is wrong. 在终端中运行nvidia-smi时,我发现GPU利用率仅为15%,这让我怀疑我的代码是否错误。 I am posting my code above to kindly ask you if there is a bug causing to slow the performance of GPU. 我在上面发布了我的代码,以问您是否存在导致GPU性能降低的错误。

Thank you, 谢谢,

Just try assigning GPUs forcefully, 只需尝试强制分配GPU,

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"  # or if you want more than 1 GPU set it as "0", "1"

Hope this helps! 希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM