繁体   English   中英

使用 Tensorflow 进行深度学习:一次学习一个元素的个性化训练循环

[英]Deep learning with Tensorflow: personalized training loop that learn with one element at a time

我需要使用具有不同大小元素的批处理,所以我尝试创建一个个性化的训练循环,主要思想是从 keras 提供的那个开始:

for epoch in range(epochs):
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            logits = model(x_batch_train, training=True)  
            loss_value = loss_fn(y_batch_train, logits)
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

并在批量大小上添加一个 cicle,这样我可以一次训练网络一个元素,并且只有在批量的每个元素都通过 newtork 之后才更新权重。 就像是:

for epoch in range(epochs):
    for training in range(trainingsize):
       for batch in range(batchsize):
            with tf.GradientTape() as tape:
               logits = model(x, training=True)  # Logits for this minibatch
               loss_value = loss_fn(, logits)
            grads = tape.gradient(loss_value, model.trainable_weights)
     optimizer.apply_gradients(zip(grads, model.trainable_weights))

其中 x 和 y 是批次的单个元素。

但我注意到,以这种方式我只考虑最后一批(因为 grads 被覆盖了)。

我该如何管理? 我不知道如何“合并”不同的毕业生。

还有一个好奇心:我认为使用“with”语句创建的变量仅在语句内部有效,那么在外部使用它怎么可能有效?

更新


我尝试了 SoheilStar 的解决方案,但是 tape.gradient 返回一个向量 [None, None, None, None] 并且在“apply_gradients”上它说“没有为任何变量提供渐变:['conv1d/kernel:0', 'conv1d/bias: 0','密集/内核:0','密集/偏差:0']。

我不知道在这种情况下如何调试才能找到问题

我使用的代码有主要部分:

optimizer = keras.optimizers.Adam( learning_rate=0.001,name="Adam")
loss_fn = keras.losses.CategoricalCrossentropy(from_logits=True)

model = keras.Sequential()
model.add(Conv1D(2, ksize, activation='relu', input_shape=ishape))
model.add(GlobalMaxPooling1D(data_format="channels_last"))
model.add(Dense(2, activation='sigmoid'))


for epoch in range(epochsize):
    batchp=1
    for k in range(trainingsize):
        loss_value = tf.constant(0.)
        mini_batch_losses=[]
        for s in range(batchsize):
            X_train, y_train = loadvalue(batchp) #caricamento elementi
            with tf.GradientTape() as tape:
                logits = model(X_train , training=True)
                loss_value = loss_fn(y_train, logits)
            mini_batch_losses.append(loss_value)
            batchp += 1
        loss_avg = tf.reduce_mean(mini_batch_losses)
        grads = tape.gradient(loss_avg, model.trainable_weights)
        optimizer.apply_gradients(grads_and_vars=zip(grads, model.trainable_weights))

更新2:我注意到如果我以这种方式更改训练循环,它会起作用,但我不明白为什么以及它是否正确:

for epoch in range(epochsize):
    batchp=1
    for k in range(trainingsize):
        loss_value = tf.constant(0.)
        mini_batch_losses=[]
        with tf.GradientTape() as tape:
            for s in range(batchsize):
                X_train, y_train = loadvalue(batchp)
                logits = model(X_train , training=True)
                tape.watch(X_train)
                loss_value = loss_fn(y_train, logits)
            mini_batch_losses.append(loss_value)
            batchp += 1
            loss_avg = tf.reduce_mean(mini_batch_losses)
        grads = tape.gradient(loss_avg, model.trainable_weights)
        optimizer.apply_gradients(grads_and_vars=zip(grads, model.trainable_weights))

grads 变量只包含变量的梯度。 要应用它们,您需要将优化器移动到最后一个 For 循环中。 但是为什么不写一个正常的训练循环然后将batch_size设置为1呢?

====== 更新

您可以在最后一个 For 循环中计算每个样本的损失,然后执行 reduce_mean 来计算损失的平均值,然后计算梯度。 代码更新。

for epoch in range(epochs):
    for training in range(trainingsize):
       mini_batch_losses = []
       for batch in range(batchsize):
            with tf.GradientTape() as tape:
               logits = model(x, training=True)  # Logits for this minibatch
               loss_value = loss_fn(y_true, logits)
            mini_batch_losses.append(loss_value)
       loss_avg = tf.reduce_mean(mini_batch_losses)
       grads = tape.gradient(loss_avg , model.trainable_weights)
       optimizer.apply_gradients(zip(grads, model.trainable_weights))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM