[英]Preprocess the input data slow down the input pipeline when using Tensorflow Dataset API to read TFRecords file
I am using Tensorflow Dataset API to read TFRecords files, but the GPU usage is still low (10%). 我正在使用Tensorflow Dataset API读取TFRecords文件,但是GPU使用率仍然很低(10%)。 I reckon the cause is that I preprocess the data before they are fed into the
sess.run()
. 我认为原因是我在将数据输入
sess.run()
之前对其进行了sess.run()
。 Here is my code below. 这是下面的代码。
1. Create a dataset from 3 separate files. 1.从3个单独的文件创建一个数据集。
tf.reset_default_graph()
# The content of TFRecords files is that each row is an array. Calculate total rows.
n_total_row = sum(1 for _ in tf.python_io.tf_record_iterator(epd))
def get_epd_dataset(filename):
dataset = tf.data.TFRecordDataset(filename)
def _parse_function(example_proto):
keys_to_features = {'data':tf.VarLenFeature(tf.int64)}
parsed_features = tf.parse_single_example(example_proto, keys_to_features)
return tf.sparse_tensor_to_dense(parsed_features['data'])
# Parse the record into tensors.
dataset = dataset.map(_parse_function)
return dataset
# There are 3 essential files comprising input data. It reads 3 seperate
# files "epd", "y_id", "x_feat" into 3 separate dataset respectively, and
# uses `Dataset.zip()` to combine these 3 separate files into 1 dataset.
epd_ds = get_epd_dataset(epd)
n_lexicon, id_ds = get_id_dataset(y_id)
feat_ds = get_feat_dataset(x_feat)
data_ds = tf.data.Dataset.zip((feat_ds, epd_ds, id_ds))
# Shuffle the dataset
data_ds = data_ds.shuffle(buffer_size=n_total_row, reshuffle_each_iteration=True)
# Repeat the input indefinitly
data_ds = data_ds.repeat(epoch)
# Generate batches
data_ds = data_ds.batch(1)
# Create a one-shot iterator
iterator = data_ds.make_one_shot_iterator()
data_iter = iterator.get_next()
2. Build a Tensorflow graph. 2.建立一个Tensorflow图。
n_input = DIM*(LEFT+1+RIGHT)
n_classes = n_lexicon
mlp = MultiLayerPerceptron.MultiLayerPerceptron(DIM*(LEFT+1+RIGHT), n_lexicon)
# tf Graph input
X = tf.placeholder("float", [None, n_input])
Y = tf.placeholder("float", [None, n_classes])
logits = mlp.multilayer_perceptron(X, dropout_mode)
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y), name='loss_op')
optimizer = tf.train.AdamOptimizer(learning_rate=lr)
train_op = optimizer.minimize(loss_op, name='train_op')
3. Generate data from data_iter
and run TF session. 3.从
data_iter
生成数据并运行TF会话。
sess = tf.Session()
# Initialization
sess.run(tf.global_variables_initializer())
for e in range(1, epoch+1):
while True:
try:
# Get data from dataset iterator
tmp = sess.run([data_iter])[0]
# a,b,c are a row from 3 serapate files.
a = tmp[0].flatten()
b = tmp[1].flatten()
c = tmp[2].flatten()
# I believe this step slows down my input pipeline.
x_train, y_train = _data_generate(mlp, b, d, c)
_, c = sess.run([train_op, loss_op], feed_dict={X: x_train,
Y: y_train})
except tf.errors.OutOfRangeError:
break
sess.close()
My code reaches about 10~15% of GPU usage. 我的代码达到了GPU使用率的10%到15%。 I think the cause is that
_data_generate()
consumes too much time on processing numpy array. 我认为原因是
_data_generate()
在处理numpy数组上花费了太多时间。 But I don't know how to improve my pipeline. 但我不知道如何改善管道。 Here are my questions.
这是我的问题。
sess.run()
. sess.run()
。 I didn't choose the latter solution due to this website mention that We found that using tf.FIFOQueue and tf.train.queue_runner could not saturate multiple current generation GPUs when using large inputs and processing with higher samples per second,
我们发现使用tf.FIFOQueue和tf.train.queue_runner不能在使用大型输入和每秒处理更高采样的情况下使多个当前一代GPU饱和,
I think that putting _data_generate()
in _parse_function()
may solve this problem, bucause Tensorflow handles preprocessing data part but not python. 我认为将
_data_generate()
放在_parse_function()
可能会解决此问题,bucause Tensorflow处理预处理数据部分,但不处理python。 But I don't know how to do this since _data_generate()
needs 3 rows from 3 separate files. 但是我不知道该怎么做,因为
_data_generate()
需要3个独立文件中的3行。 Does anyone know how to do this? 有谁知道如何做到这一点?
Are there other methods could solve my low-GPU-usage problem? 还有其他方法可以解决GPU使用率低的问题吗?
Thank you. 谢谢。
Can you share the code of _data_generate
function? 可以共享
_data_generate
函数的代码吗? I can't see what it does. 我看不到它在做什么。
As you pointed out performance is likely lost because of RAM <-> GPU memory swap and mixing tensorflow ops with pythonic ones. 正如您所指出的,由于RAM <-> GPU内存交换以及将tensorflow操作与pythonic操作混合,性能可能会损失。
Instead of running iterator data_iter
yourself by sess.run()
, doing numpy operations and then training step, pass data_iter
as input to your neural network graph - it should replace the placeholders. 不必自己通过
sess.run()
运行迭代器data_iter
, sess.run()
执行numpy操作,然后进行训练,然后将data_iter
作为输入传递给神经网络图-它应替换占位符。 (just make a function that constructs the graph using data_iter
as parameter). (只需创建一个使用
data_iter
作为参数构造图的函数)。
I think that putting _data_generate() in _parse_function() may solve his problem, bucause Tensorflow handles preprocessing data part but not >python.
我认为将_data_generate()放在_parse_function()中可能会解决他的问题,bucause Tensorflow处理预处理数据部分,但不处理> python。 But I don't know how to do this since _data_generate() needs 3 >rows from 3 separate files.
但是我不知道该怎么做,因为_data_generate()需要3个独立文件中的3行。 Does anyone know how to do this?
有谁知道如何做到这一点?
The proper way is to create 3 datasets from files, decode them, zip them, and then pass the iterator to zipped dataset as input to processing graph. 正确的方法是从文件中创建3个数据集,对其进行解码,压缩,然后将迭代器传递给压缩数据集,作为处理图的输入。 You're almost doing that.
您快要这样做了。
Also; 也; Try to enforce multithreading whenever it is possible/needed.
尝试在可能/需要时强制执行多线程。 Here:
这里:
...
return tf.sparse_tensor_to_dense(parsed_features['data'])
# Parse the record into tensors.
dataset = dataset.map(_parse_function)
return dataset
You should use: 您应该使用:
dataset.map(_parse_function, num_threads=<MORE THAN ONE>)
Where <MORE THAN ONE>
is an integer bigger than one. 其中
<MORE THAN ONE>
是大于1的整数。 In your case I would start with 8 threads (see if GPU will be 100%) 在您的情况下,我将从8个线程开始(看看GPU是否为100%)
Check dis out and tell me if its ok 检查一下dis,然后告诉我是否还可以
I'm assuming your example uses a simplified version of your model, otherwise the GPU will almost always terminate its work before the next batch is ready. 我假设您的示例使用模型的简化版本,否则GPU几乎总是在下一批准备就绪之前终止其工作。
Each dataset and transofrmation pipeline has its own specificities and it's difficult to provide a definite answer, but here might be some points worth investigating: 每个数据集和跨行业管道都有其自身的特殊性,很难提供确切的答案,但是这里可能有一些值得研究的地方:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.