![](/img/trans.png)
[英]How to measure training time per batches during Deep Learning in Tensorflow?
[英]Deep learning with Tensorflow: personalized training loop that learn with one element at a time
我需要使用具有不同大小元素的批处理,所以我尝试创建一个个性化的训练循环,主要思想是从 keras 提供的那个开始:
for epoch in range(epochs):
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
并在批量大小上添加一个 cicle,这样我可以一次训练网络一个元素,并且只有在批量的每个元素都通过 newtork 之后才更新权重。 就像是:
for epoch in range(epochs):
for training in range(trainingsize):
for batch in range(batchsize):
with tf.GradientTape() as tape:
logits = model(x, training=True) # Logits for this minibatch
loss_value = loss_fn(, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
其中 x 和 y 是批次的单个元素。
但我注意到,以这种方式我只考虑最后一批(因为 grads 被覆盖了)。
我该如何管理? 我不知道如何“合并”不同的毕业生。
还有一个好奇心:我认为使用“with”语句创建的变量仅在语句内部有效,那么在外部使用它怎么可能有效?
更新
我尝试了 SoheilStar 的解决方案,但是 tape.gradient 返回一个向量 [None, None, None, None] 并且在“apply_gradients”上它说“没有为任何变量提供渐变:['conv1d/kernel:0', 'conv1d/bias: 0','密集/内核:0','密集/偏差:0']。
我不知道在这种情况下如何调试才能找到问题
我使用的代码有主要部分:
optimizer = keras.optimizers.Adam( learning_rate=0.001,name="Adam")
loss_fn = keras.losses.CategoricalCrossentropy(from_logits=True)
model = keras.Sequential()
model.add(Conv1D(2, ksize, activation='relu', input_shape=ishape))
model.add(GlobalMaxPooling1D(data_format="channels_last"))
model.add(Dense(2, activation='sigmoid'))
for epoch in range(epochsize):
batchp=1
for k in range(trainingsize):
loss_value = tf.constant(0.)
mini_batch_losses=[]
for s in range(batchsize):
X_train, y_train = loadvalue(batchp) #caricamento elementi
with tf.GradientTape() as tape:
logits = model(X_train , training=True)
loss_value = loss_fn(y_train, logits)
mini_batch_losses.append(loss_value)
batchp += 1
loss_avg = tf.reduce_mean(mini_batch_losses)
grads = tape.gradient(loss_avg, model.trainable_weights)
optimizer.apply_gradients(grads_and_vars=zip(grads, model.trainable_weights))
更新2:我注意到如果我以这种方式更改训练循环,它会起作用,但我不明白为什么以及它是否正确:
for epoch in range(epochsize):
batchp=1
for k in range(trainingsize):
loss_value = tf.constant(0.)
mini_batch_losses=[]
with tf.GradientTape() as tape:
for s in range(batchsize):
X_train, y_train = loadvalue(batchp)
logits = model(X_train , training=True)
tape.watch(X_train)
loss_value = loss_fn(y_train, logits)
mini_batch_losses.append(loss_value)
batchp += 1
loss_avg = tf.reduce_mean(mini_batch_losses)
grads = tape.gradient(loss_avg, model.trainable_weights)
optimizer.apply_gradients(grads_and_vars=zip(grads, model.trainable_weights))
grads 变量只包含变量的梯度。 要应用它们,您需要将优化器移动到最后一个 For 循环中。 但是为什么不写一个正常的训练循环然后将batch_size设置为1呢?
====== 更新
您可以在最后一个 For 循环中计算每个样本的损失,然后执行 reduce_mean 来计算损失的平均值,然后计算梯度。 代码更新。
for epoch in range(epochs):
for training in range(trainingsize):
mini_batch_losses = []
for batch in range(batchsize):
with tf.GradientTape() as tape:
logits = model(x, training=True) # Logits for this minibatch
loss_value = loss_fn(y_true, logits)
mini_batch_losses.append(loss_value)
loss_avg = tf.reduce_mean(mini_batch_losses)
grads = tape.gradient(loss_avg , model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.