Deep learning with Tensorflow: personalized training loop that learn with one element at a time

Question

I need to use batch with element of different size, so i try to create a personalized training loop, the main idea is to start from the one supplied from keras:

for epoch in range(epochs):
    for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            logits = model(x_batch_train, training=True)  
            loss_value = loss_fn(y_batch_train, logits)
        grads = tape.gradient(loss_value, model.trainable_weights)
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

and add a cicle over the batch size, in this way i can train the network one element a time, and update the weight only after every element of the batch pass trough the newtork. Something like:

for epoch in range(epochs):
    for training in range(trainingsize):
       for batch in range(batchsize):
            with tf.GradientTape() as tape:
               logits = model(x, training=True)  # Logits for this minibatch
               loss_value = loss_fn(, logits)
            grads = tape.gradient(loss_value, model.trainable_weights)
     optimizer.apply_gradients(zip(grads, model.trainable_weights))

where x and y are a single element of the batch.

But i notice that in this way i consider only the last batch ( because grads is ovewritten).

How can i manage this? I don't know how to "merge" different grads.

Also one curiosity: i thought that the variable create with "with" statements are valid only inside the statement, so how is it possible that using it outside works?

UPDATE

I tried SoheilStar's solution, but tape.gradient return a vector [None, None, None, None] and on "apply_gradients" it said " No gradients provided for any variable: ['conv1d/kernel:0', 'conv1d/bias:0', 'dense/kernel:0', 'dense/bias:0']."

I don't know how to debug in this case to find the problem

There is the main part of the code i use:

optimizer = keras.optimizers.Adam( learning_rate=0.001,name="Adam")
loss_fn = keras.losses.CategoricalCrossentropy(from_logits=True)

model = keras.Sequential()
model.add(Conv1D(2, ksize, activation='relu', input_shape=ishape))
model.add(GlobalMaxPooling1D(data_format="channels_last"))
model.add(Dense(2, activation='sigmoid'))


for epoch in range(epochsize):
    batchp=1
    for k in range(trainingsize):
        loss_value = tf.constant(0.)
        mini_batch_losses=[]
        for s in range(batchsize):
            X_train, y_train = loadvalue(batchp) #caricamento elementi
            with tf.GradientTape() as tape:
                logits = model(X_train , training=True)
                loss_value = loss_fn(y_train, logits)
            mini_batch_losses.append(loss_value)
            batchp += 1
        loss_avg = tf.reduce_mean(mini_batch_losses)
        grads = tape.gradient(loss_avg, model.trainable_weights)
        optimizer.apply_gradients(grads_and_vars=zip(grads, model.trainable_weights))

UPDATE 2: I notice that if i change the training cicle in this way, it works but i don't understand why and if it is correct:

for epoch in range(epochsize):
    batchp=1
    for k in range(trainingsize):
        loss_value = tf.constant(0.)
        mini_batch_losses=[]
        with tf.GradientTape() as tape:
            for s in range(batchsize):
                X_train, y_train = loadvalue(batchp)
                logits = model(X_train , training=True)
                tape.watch(X_train)
                loss_value = loss_fn(y_train, logits)
            mini_batch_losses.append(loss_value)
            batchp += 1
            loss_avg = tf.reduce_mean(mini_batch_losses)
        grads = tape.gradient(loss_avg, model.trainable_weights)
        optimizer.apply_gradients(grads_and_vars=zip(grads, model.trainable_weights))

Answer 1

grads variable only contains the gradients of variables. to apply them you need to move the optimizer inside the last For loop. but why not writing a normal training loop and then set the batch_size to one?

====== Update

you can calculate the loss for each sample in the last For loop and then do a reduce_mean to calculate the mean of loss and then calculate the grads. code updated.

for epoch in range(epochs):
    for training in range(trainingsize):
       mini_batch_losses = []
       for batch in range(batchsize):
            with tf.GradientTape() as tape:
               logits = model(x, training=True)  # Logits for this minibatch
               loss_value = loss_fn(y_true, logits)
            mini_batch_losses.append(loss_value)
       loss_avg = tf.reduce_mean(mini_batch_losses)
       grads = tape.gradient(loss_avg , model.trainable_weights)
       optimizer.apply_gradients(zip(grads, model.trainable_weights))

Deep learning with Tensorflow: personalized training loop that learn with one element at a time

Question

1 answers

solution1
1 ACCPTED 2021-05-05 16:47:20

Deep learning with Tensorflow: personalized training loop that learn with one element at a time

Question

1 answers

solution1 1 ACCPTED 2021-05-05 16:47:20

solution1
1 ACCPTED 2021-05-05 16:47:20