简体   繁体   中英

Tensorflow pretrained models input channel range

I came across this example which implements a pretrained model. It says:

Format the Data

Use the tf.image module to format the images for the task.

Resize the images to a fixed input size, and rescale the input channels to a range of [-1,1]

IMG_SIZE = 160 # All images will be resized to 160x160

def format_example(image, label):
  image = tf.cast(image, tf.float32)
  image = (image/127.5) - 1
  image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
  return image, label

I was wondering about this. What I understand is that image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE)) resizes the images (which can have any size) to one consistent size. I understand that image = (image/127.5) - 1 does not change the actual size of the images, but changes the values (pixels) (which are between 0 to 255) to a range of [-1,1]. In other examples I saw normalization/standardization being done to a range of [0,1], so rescaling by 1.0/255. I do not understand when I have to use which. If I use my own model, it is up to me to scale to a range of [-1,1] or [0,1]? However, when I use a pretrained model I need to know what is required. I googled the mobilenetv2 model, but could not find any documentation telling me that the required input channel is [-1,1]. In this comment it says all pretrained tensorflow models require an input channel of [-1,1]. Is that true? Especially, is that true that all models in the tensorflow hub (if about images) require a range of [-1,1]?

Finally, how do I find out what the required range is for a pretrained model? I would not have figured out the [-1,1] in case of MobileNetv2 by my own. On the tensorflow MobileNetv2 page I could not find this information.

Furthermore: Is there a way to basically have this done automatically? So that I use a function and it automatically checks the pretrained tensorflow dataset (which has an object storing that information) and applies it (assuming 0-255 is my input)? I think tf.keras.applications.mobilenet_v2.preprocess_input is doing something else (I am not really understanding what it does)? And it is also just for mobilenetv2.

Generally, you are concerned with 'what scaling should I choose between [0, 1], [-1, 1]?' As the answer may be different depending on the cases, I would like to point them out below.

  1. CNN architectures work better in short closed range input values. Therefore, both, [0, 1] and [-1, 1] may be a good choice. However, depending on the architecture, the selection can be different. As a result, it would be a good option to try various scales.

  2. Concerning the pre-trained model of Keras, I noticed that most models that use residuals (such as, ResNets , MobileNetV2 , InceptionResNetV2 ) use [-1, 1] scale. Using [-1, 1] scales in residuals, causes some edges to be deactivated in some cases. To further understand, let us consider a perceptron y = wx + b . If w = 1 and b = 1 then using input x = 1 results y = 0 . This states that using [-1, 1] scale, some input values can be nullified by the bias (without setting w=0 ). This concept is mostly true for the other models (excluding Keras) as well.

  3. Almost all of the Keras architectures use scaling techniques. I believe in some cases, they did not perform the suggested operations instructed by the original papers. So, I believe you should stick with Keras' documentation in case of using their pre-trained model. If you do not find any scaling on their documentation, you should avoid scaling it.

Furthermore, you should try testing different scaling methods while you are using different datasets. However, this should not highly improve the accuracy of the model in most cases. Please let me know if you have more queries. Thanks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM