I need to use batch with element of different size, so i try to create a personalized training loop, the main idea is to start from the one supplied from keras:
for epoch in range(epochs):
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
and add a cicle over the batch size, in this way i can train the network one element a time, and update the weight only after every element of the batch pass trough the newtork. Something like:
for epoch in range(epochs):
for training in range(trainingsize):
for batch in range(batchsize):
with tf.GradientTape() as tape:
logits = model(x, training=True) # Logits for this minibatch
loss_value = loss_fn(, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
where x and y are a single element of the batch.
But i notice that in this way i consider only the last batch ( because grads is ovewritten).
How can i manage this? I don't know how to "merge" different grads.
Also one curiosity: i thought that the variable create with "with" statements are valid only inside the statement, so how is it possible that using it outside works?
UPDATE
I tried SoheilStar's solution, but tape.gradient return a vector [None, None, None, None] and on "apply_gradients" it said " No gradients provided for any variable: ['conv1d/kernel:0', 'conv1d/bias:0', 'dense/kernel:0', 'dense/bias:0']."
I don't know how to debug in this case to find the problem
There is the main part of the code i use:
optimizer = keras.optimizers.Adam( learning_rate=0.001,name="Adam")
loss_fn = keras.losses.CategoricalCrossentropy(from_logits=True)
model = keras.Sequential()
model.add(Conv1D(2, ksize, activation='relu', input_shape=ishape))
model.add(GlobalMaxPooling1D(data_format="channels_last"))
model.add(Dense(2, activation='sigmoid'))
for epoch in range(epochsize):
batchp=1
for k in range(trainingsize):
loss_value = tf.constant(0.)
mini_batch_losses=[]
for s in range(batchsize):
X_train, y_train = loadvalue(batchp) #caricamento elementi
with tf.GradientTape() as tape:
logits = model(X_train , training=True)
loss_value = loss_fn(y_train, logits)
mini_batch_losses.append(loss_value)
batchp += 1
loss_avg = tf.reduce_mean(mini_batch_losses)
grads = tape.gradient(loss_avg, model.trainable_weights)
optimizer.apply_gradients(grads_and_vars=zip(grads, model.trainable_weights))
UPDATE 2: I notice that if i change the training cicle in this way, it works but i don't understand why and if it is correct:
for epoch in range(epochsize):
batchp=1
for k in range(trainingsize):
loss_value = tf.constant(0.)
mini_batch_losses=[]
with tf.GradientTape() as tape:
for s in range(batchsize):
X_train, y_train = loadvalue(batchp)
logits = model(X_train , training=True)
tape.watch(X_train)
loss_value = loss_fn(y_train, logits)
mini_batch_losses.append(loss_value)
batchp += 1
loss_avg = tf.reduce_mean(mini_batch_losses)
grads = tape.gradient(loss_avg, model.trainable_weights)
optimizer.apply_gradients(grads_and_vars=zip(grads, model.trainable_weights))
grads variable only contains the gradients of variables. to apply them you need to move the optimizer inside the last For loop. but why not writing a normal training loop and then set the batch_size to one?
====== Update
you can calculate the loss for each sample in the last For loop and then do a reduce_mean to calculate the mean of loss and then calculate the grads. code updated.
for epoch in range(epochs):
for training in range(trainingsize):
mini_batch_losses = []
for batch in range(batchsize):
with tf.GradientTape() as tape:
logits = model(x, training=True) # Logits for this minibatch
loss_value = loss_fn(y_true, logits)
mini_batch_losses.append(loss_value)
loss_avg = tf.reduce_mean(mini_batch_losses)
grads = tape.gradient(loss_avg , model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.