Can someone please explain the content loss function?

Question

I am currently getting familiar with TensorFlow and machine learning. I am doing some tutorials on style transfer and now I have a part of an example code that I somehow can not comprehend.

I think I get the main idea: there are three images, the content image, the style image and the mixed image. Let's just talk about the content loss first, because if I can understand that, I will also understand the style loss. So I have the content image and the mixed image (starting from some distribution with some noise), and the VGG16 model.

As far as I can understand, I should now feed the content image into the network to some layer, and see what is the output (feature map) of that layer for the content image input.

After that I also should feed the network with the mixed image to the same layer as before, and see what is the output (feature map) of that layer for the mixed image input.

I then should calculate the loss function from these two output, because I would like the mixed image to have a similar feature map to the content image.

My problem is that I do not understand how this is done in the example codes that I could find online.

The example code can be the following: http://gcucurull.github.io/tensorflow/style-transfer/2016/08/18/neural-art-tf/

But nearly all of the examples used the same approach.

The content loss is defined like this:

def content_loss(cont_out, target_out, layer, content_weight):

    # content loss is just the mean square error between the outputs of a given layer
    # in the content image and the target image

    cont_loss = tf.reduce_sum(tf.square(tf.sub(target_out[layer], cont_out)))

    # multiply the loss by its weight
    cont_loss = tf.mul(cont_loss, content_weight, name="cont_loss")

return cont_loss

And is called like this:

# compute loss
cont_cost = losses.content_loss(content_out, model, C_LAYER, content_weight)

Where content_out is the output for the content image, model is the used model, C_LAYER is the reference to the layer that we would like to get the output of and content_weight is the weight with which we multiply.

The problem is that I somehow can not see where this feeds the network with the mixed image. It seems to me that into "cont_loss" calculates the root mean squared between the output for the content image and the between the layer itself.

The magic should be somewhere here:

cont_loss = tf.reduce_sum(tf.square(tf.sub(target_out[layer], cont_out)))

But I simply can not find how this produces the RMS between the feature map of the content image and the feature map of the mixed image at the given layer.

I would be very thankful if someone could point out where I am wrong and explain to me, how that content loss is calculated.

Thanks!

Answer 1

The loss forces the networks to have similar activation on the layer you have chosen.

Let us call one convolutional map/pixel from target_out[layer] and corresponding map from cont_out . You want their difference L-C to be as small as possible, ie, the absolute value of their difference. For the sake of numerical stability, we use the square function instead of absolute value because it is a smooth function and more tolerant of small errors.

We thus get （L-C）** 2 , which is: tf.square(tf.sub(target_out[layer], cont_out)) .

Finally, we want to minimize the difference for each map and each example in the batch. This is why we sum all the difference into a single scalar using tf.reduce_sum .

Can someone please explain the content loss function?

Question

1 answers

solution1
1 ACCPTED 2019-04-08 11:42:43

Can someone please explain the content loss function?

Question

1 answers

solution1 1 ACCPTED 2019-04-08 11:42:43

solution1
1 ACCPTED 2019-04-08 11:42:43