使用Tensorflow数据集在GPU上进行模型训练时的周期性开销

Question

如您在以下代码中看到的那样，我正在尝试使用Tensorflow数据集在Tensorflow上训练一个简单的模型。 数据集非常庞大，我将其拖拉，重复和批处理，以进行随机梯度下降以训练模型。

但是我可以观察到优化步骤的周期开销（在我的代码中为sess.run（train））。

正如您在此处看到的那样，每5步进行优化需要3秒而不是0.5秒。

步骤105的持续时间：3.5233473777770996

步骤106持续时间：0.5653283596038818

步骤107持续时间：0.5391891002655029

步骤108持续时间：0.5480048656463623

步骤109持续时间：0.0415492057800293

步骤110的持续时间：3.032115936279297

步骤111持续时间：0.5407207012176514

步骤112持续时间：0.5276811122894287

步骤113持续时间：0.5448746681213379

步骤114持续时间：0.04253268241882324

步骤115的持续时间：3.1273345947265625

此外，我的GPU几乎始终都处于0％的利用率，并且使用了大约90％的内存。

当Iterator完成查看所有数据集时，似乎已经达到了开销。

我在Ubuntu 16.04上将Python 3.6与Tensorflow 1.4一起使用。

您知道我如何加快培训速度吗？

最好，

import tensorflow as tf
import numpy as np
import os, time, multiprocessing
import matplotlib.pyplot as plt

def _floats_feature(value):
    return tf.train.Feature(float_list=tf.train.FloatList(value=value.reshape(-1)))


def parser(record):
    num_features = 2000
    size_group = 300
    num_classes= 10
    class_indice = 0
    keys_to_features={
                'X': tf.FixedLenFeature([size_group*num_features],tf.float32),
                'label' : tf.FixedLenFeature([num_classes],tf.float32)}
    parsed = tf.parse_single_example(record, keys_to_features)

    label = parsed['label']
    label = tf.slice(label,[class_indice],[1])
    label = tf.squeeze(label) # To get a vector one dimension
    X = parsed['X']
    X= tf.reshape(X, [size_group,num_features])
    return X, label


def test_train_w_dataset():

    # Definition of the size 
    num_features = 2000
    num_ex = 2000
    size_group = 300
    num_classes = 10
    batch_size= 480
    max_iters = 300
    buffer_size = 10000

# Creation of the Dataset 
filename_tfrecords = 'tmp.tfrecords'
if not(os.path.isfile(filename_tfrecords)): # If the file doesn't exist we will create it
    print("Start creating the Dataset")
    writer = tf.python_io.TFRecordWriter(filename_tfrecords)

    for i in range(num_ex):
        if i % 1000 == 0: print("Step :",i)
        X = np.random.normal(size=(size_group,num_features))
        vectors =  2*np.random.randint(0,2,(num_classes,1))-1
        features=tf.train.Features(feature={
                    'X': _floats_feature(X),
                    'label' : _floats_feature(vectors)})
        example = tf.train.Example(features=features)       
        writer.write(example.SerializeToString())
    writer.close()
else:
    print("The dataset tfrecords already exist")

train_dataset = tf.data.TFRecordDataset(filename_tfrecords)
num_proc = multiprocessing.cpu_count()
train_dataset = train_dataset.map(parser,
                                    num_parallel_calls=num_proc)
dataset_shuffle = train_dataset.shuffle(buffer_size=buffer_size,
                                             reshuffle_each_iteration=True) 
dataset_shuffle = dataset_shuffle.batch(batch_size)
dataset_shuffle = dataset_shuffle.repeat() 
dataset_shuffle = dataset_shuffle.prefetch(batch_size) 
shuffle_iterator = dataset_shuffle.make_initializable_iterator()
X_, y_ = shuffle_iterator.get_next()

W=tf.Variable(tf.random_normal([num_features], stddev=1.),name="weights")
W=tf.reshape(W,(1,1,num_features))
Prod=tf.reduce_sum(tf.multiply(W,X_),axis=2)
Max=tf.reduce_max(Prod,axis=1)
Tan= tf.reduce_sum(tf.multiply(tf.tanh(Max),y_))
loss= tf.add(Tan,tf.reduce_sum(tf.multiply(W,W)))

LR = 0.01
restarts = 1
optimizer = tf.train.GradientDescentOptimizer(LR) 
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
train = optimizer.minimize(loss)  
print("The graph is defined")
sess = tf.Session(config=config)

durationTab = []

for essai in range(restarts+1):
    # To do need to reinitialiszed
    t0 = time.time()
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    sess.run(shuffle_iterator.initializer)
    t1 = time.time()
    duration = t1 - t0
    print('Duration of initialization : ',duration)
    for step in range(max_iters):
        t0 = time.time()
        sess.run(train)
        t1 = time.time()
        duration = t1 - t0
        print("Step ",str(step),' duration : ',duration)
        durationTab += [duration]


plt.plot(durationTab)
plt.ylabel('Duration')
plt.xlabel('Iteration')
plt.show()

if __name__ == '__main__':

    test_train_w_dataset()

Answer 1

对于GPU利用率，请确保使用gpu优化的二进制文件。 检查操作位置（例如在张量板上）。 强制将操作放置在GPU上（请参阅tf.device）。

对于周期性的峰值，可能有以下几个原因：

其他进程会阻止对CPU / GPU / RAM /磁盘的访问，您需要等待它通过。 您可以尝试杀死系统上可能正在运行的其他多余任务。
您用完了内存。 检查使用了多少交换空间。 如果运行时它在增长，则峰值可能只是系统的抖动，尽管这样做看起来表现得很好。
磁盘访问。 您提到，这似乎与遍历数据相关。 可能是系统只需要再次读取数据，因此您需要等待磁盘，尽管通常这是不可见的。 您可以通过确保数据在硬盘驱动器上连续并将其移动到SSD或RAM来加快速度。

由于很多原因与RAM有关，因此您可能应该尝试使用较小的模型（较小的批次，较少的层，较少的节点/层），并查看其是否消失。 如果是这样，那么您需要出去购买更多的RAM。

Answer 2

似乎在批处理和重复函数之间添加dataset_shuffle = dataset_shuffle.cache（）消除了这些周期性开销。 但是，我不确定使用此命令是否已完全读取数据集。

使用Tensorflow数据集在GPU上进行模型训练时的周期性开销

问题描述

2 个解决方案

解决方案1
0 2018-04-19 13:11:39

解决方案2
0 2018-04-27 11:54:18

使用Tensorflow数据集在GPU上进行模型训练时的周期性开销

问题描述

2 个解决方案

解决方案1 0 2018-04-19 13:11:39

解决方案2 0 2018-04-27 11:54:18

解决方案1
0 2018-04-19 13:11:39

解决方案2
0 2018-04-27 11:54:18