[英]Periodic overhead when using tensorflow dataset for model training on GPU
As you can see it in the following code, I am trying to train a simple model on Tensorflow with a Tensorflow Dataset. 如您在以下代码中看到的那样,我正在尝试使用Tensorflow数据集在Tensorflow上训练一个简单的模型。 The dataset is pretty huge and I suffle , repeat and batch it in order to do a stochastic gradien descent for training my model.
数据集非常庞大,我将其拖拉,重复和批处理,以进行随机梯度下降以训练模型。
But I can observe a period overhead of the optimisation step (it is sess.run(train) in my code). 但是我可以观察到优化步骤的周期开销(在我的代码中为sess.run(train))。
As you can see it here, every 5 steps, it needs 3s instead of 0.5 to do the optimisation. 正如您在此处看到的那样,每5步进行优化需要3秒而不是0.5秒。
Step 105 duration : 3.5233473777770996 步骤105的持续时间:3.5233473777770996
Step 106 duration : 0.5653283596038818 步骤106持续时间:0.5653283596038818
Step 107 duration : 0.5391891002655029 步骤107持续时间:0.5391891002655029
Step 108 duration : 0.5480048656463623 步骤108持续时间:0.5480048656463623
Step 109 duration : 0.0415492057800293 步骤109持续时间:0.0415492057800293
Step 110 duration : 3.032115936279297 步骤110的持续时间:3.032115936279297
Step 111 duration : 0.5407207012176514 步骤111持续时间:0.5407207012176514
Step 112 duration : 0.5276811122894287 步骤112持续时间:0.5276811122894287
Step 113 duration : 0.5448746681213379 步骤113持续时间:0.5448746681213379
Step 114 duration : 0.04253268241882324 步骤114持续时间:0.04253268241882324
Step 115 duration : 3.1273345947265625 步骤115的持续时间:3.1273345947265625
Moreover my GPU is almost all the time at 0% utilisation with around 90% of the memory used. 此外,我的GPU几乎始终都处于0%的利用率,并且使用了大约90%的内存。
It seems that this overhead arrived when the Iterator finish to see all the dataset. 当Iterator完成查看所有数据集时,似乎已经达到了开销。
I am using Python 3.6 with Tensorflow 1.4 on Ubuntu 16.04. 我在Ubuntu 16.04上将Python 3.6与Tensorflow 1.4一起使用。
Do you have any idea how I can speed up my training ? 您知道我如何加快培训速度吗?
Best, 最好,
import tensorflow as tf
import numpy as np
import os, time, multiprocessing
import matplotlib.pyplot as plt
def _floats_feature(value):
return tf.train.Feature(float_list=tf.train.FloatList(value=value.reshape(-1)))
def parser(record):
num_features = 2000
size_group = 300
num_classes= 10
class_indice = 0
keys_to_features={
'X': tf.FixedLenFeature([size_group*num_features],tf.float32),
'label' : tf.FixedLenFeature([num_classes],tf.float32)}
parsed = tf.parse_single_example(record, keys_to_features)
label = parsed['label']
label = tf.slice(label,[class_indice],[1])
label = tf.squeeze(label) # To get a vector one dimension
X = parsed['X']
X= tf.reshape(X, [size_group,num_features])
return X, label
def test_train_w_dataset():
# Definition of the size
num_features = 2000
num_ex = 2000
size_group = 300
num_classes = 10
batch_size= 480
max_iters = 300
buffer_size = 10000
# Creation of the Dataset
filename_tfrecords = 'tmp.tfrecords'
if not(os.path.isfile(filename_tfrecords)): # If the file doesn't exist we will create it
print("Start creating the Dataset")
writer = tf.python_io.TFRecordWriter(filename_tfrecords)
for i in range(num_ex):
if i % 1000 == 0: print("Step :",i)
X = np.random.normal(size=(size_group,num_features))
vectors = 2*np.random.randint(0,2,(num_classes,1))-1
features=tf.train.Features(feature={
'X': _floats_feature(X),
'label' : _floats_feature(vectors)})
example = tf.train.Example(features=features)
writer.write(example.SerializeToString())
writer.close()
else:
print("The dataset tfrecords already exist")
train_dataset = tf.data.TFRecordDataset(filename_tfrecords)
num_proc = multiprocessing.cpu_count()
train_dataset = train_dataset.map(parser,
num_parallel_calls=num_proc)
dataset_shuffle = train_dataset.shuffle(buffer_size=buffer_size,
reshuffle_each_iteration=True)
dataset_shuffle = dataset_shuffle.batch(batch_size)
dataset_shuffle = dataset_shuffle.repeat()
dataset_shuffle = dataset_shuffle.prefetch(batch_size)
shuffle_iterator = dataset_shuffle.make_initializable_iterator()
X_, y_ = shuffle_iterator.get_next()
W=tf.Variable(tf.random_normal([num_features], stddev=1.),name="weights")
W=tf.reshape(W,(1,1,num_features))
Prod=tf.reduce_sum(tf.multiply(W,X_),axis=2)
Max=tf.reduce_max(Prod,axis=1)
Tan= tf.reduce_sum(tf.multiply(tf.tanh(Max),y_))
loss= tf.add(Tan,tf.reduce_sum(tf.multiply(W,W)))
LR = 0.01
restarts = 1
optimizer = tf.train.GradientDescentOptimizer(LR)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
train = optimizer.minimize(loss)
print("The graph is defined")
sess = tf.Session(config=config)
durationTab = []
for essai in range(restarts+1):
# To do need to reinitialiszed
t0 = time.time()
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
sess.run(shuffle_iterator.initializer)
t1 = time.time()
duration = t1 - t0
print('Duration of initialization : ',duration)
for step in range(max_iters):
t0 = time.time()
sess.run(train)
t1 = time.time()
duration = t1 - t0
print("Step ",str(step),' duration : ',duration)
durationTab += [duration]
plt.plot(durationTab)
plt.ylabel('Duration')
plt.xlabel('Iteration')
plt.show()
if __name__ == '__main__':
test_train_w_dataset()
For GPU utilization, make sure you use the gpu optimized binary. 对于GPU利用率,请确保使用gpu优化的二进制文件。 Check operation placement (in tensorboard for example).
检查操作位置(例如在张量板上)。 Force placement of the operations on the gpu (see tf.device).
强制将操作放置在GPU上(请参阅tf.device)。
For the periodic spikes there could be a few reasons: 对于周期性的峰值,可能有以下几个原因:
Since a lot of the reasons have to do with RAM, you should probably try a smaller model (smaller batches, less layers, less nodes/layer) and see if it goes away. 由于很多原因与RAM有关,因此您可能应该尝试使用较小的模型(较小的批次,较少的层,较少的节点/层),并查看其是否消失。 If it does then you need to go out and buy more RAM.
如果是这样,那么您需要出去购买更多的RAM。
It seems that adding dataset_shuffle = dataset_shuffle.cache() between the batch and repeat function remove those periodic overhead. 似乎在批处理和重复函数之间添加dataset_shuffle = dataset_shuffle.cache()消除了这些周期性开销。 Nevertheless, I am not sure that the Dataset is fully read with the use of this command.
但是,我不确定使用此命令是否已完全读取数据集。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.