简体   繁体   English

Keras:什么是VGG16中的model.inputs

[英]Keras: What is model.inputs in VGG16

I start playing with keras and vgg16 recently, and I am using keras.applications.vgg16. 我最近开始玩keras和vgg16,我正在使用keras.applications.vgg16。

But here I come with a question about what is model.inputs because I saw others using it in https://github.com/keras-team/keras/blob/master/examples/conv_filter_visualization.py although it does not initialize it 但是我在这里提出了一个关于什么是model.inputs的问题,因为我看到其他人在https://github.com/keras-team/keras/blob/master/examples/conv_filter_visualization.py中使用它,尽管它没有初始化它

    ...
    input_img = model.input
    ...
    layer_output = layer_dict[layer_name].output
    if K.image_data_format() == 'channels_first':
        loss = K.mean(layer_output[:, filter_index, :, :])
    else:
        loss = K.mean(layer_output[:, :, :, filter_index])

    # we compute the gradient of the input picture wrt this loss
    grads = K.gradients(loss, input_img)[0]

I checked the keras site but it only said that is an input tensor with shape (1,224,224,3) But I still don't understand what is that exactly. 我检查了keras网站,但它只说这是一个形状的输入张量(1,224,224,3)但我仍然不明白究竟是什么。 Is that an image from ImageNet?Or a default image provided by keras for keras model? 这是来自ImageNet的图像吗?还是keras为keras模型提供的默认图像?

I am sorry if I don't have enough understanding of deep learning, but can someone explain it to me please. 如果我对深度学习没有足够的理解,我很抱歉,但有人可以向我解释。 Thanks 谢谢

The 4 dimensions of (1,224,224,3) are the batch_size , image_width , image_height and image_channels respectively. (1,224,224,3)的4个维度分别是batch_sizeimage_widthimage_heightimage_channels (1,224,224,3) means that the VGG16 model accepts a batch size of 1 (one image at a time) of shape 224x224 and three channels (RGB). (1,224,224,3)表示VGG16模型接受形状为224x224和三个通道(RGB)的批量大小为1 (一次一个图像)。

For more information on what a batch and therefore a batch size is, you can check this Cross Validated question. 有关batchbatch size更多信息,您可以检查交叉验证问题。

Returning to VGG16 , the input of the architecture is (1, 224, 224, 3) . 返回到VGG16 ,架构的输入是(1, 224, 224, 3) VGG16 (1, 224, 224, 3) What does this mean? 这是什么意思? That in order to input a image into the network, you will need to: 为了将图像输入网络,您需要:

  1. Preprocess it to reach a shape of (224, 224) and 3 channels (RGB) 预处理它以达到(224,224)和3个通道(RGB)的形状
  2. Convert this to an actual matrix of shape (224, 224, 3) 将其转换为实际的形状矩阵(224,224,3)
  3. Group together various images in a batch of the size that requires the network (in this case, the batch size is 1, but you need to add a dimension to the matrix, in order to obtain the (1, 224, 224, 3) 将需要网络的一批大小的各种图像组合在一起(在这种情况下,批量大小为1,但您需要在矩阵中添加一个维度,以获得(1,224,224,3)

After doing this, you can input the image to the model. 完成此操作后,您可以将图像输入到模型中。

Keras offers few utilitary functions to do these tasks. Keras提供很少的实用功能来完成这些任务。 Below I present a modified version of the code snippet shown in Extract features with VGG16 from Usage examples for image classification models in the documentation. 下面我将介绍文档中图像分类模型的用法示例中的 VGG16提取功能中显示的代码段的修改版本。

In order to have it actually working, you need a jpg of any size named elephant.jpg . 为了让它真正起作用,你需要一个名为elephant.jpg的任何大小的jpg You can obtain it running this bash command: 您可以使用此bash命令获取它:

wget https://upload.wikimedia.org/wikipedia/commons/f/f9/Zoorashia_elephant.jpg -O elephant.jpg   

I will split the code in the image preprocesing and the model prediction for clarity: 为清晰起见,我将在图像预处理和模型预测中拆分代码:

Load the image 加载图像

import numpy as np
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

You can add prints along the way to see what's going on, but here is a brief summary: 您可以沿途添加打印件以查看正在发生的情况,但这里有一个简短的摘要:

  1. image.load_img() load a PIL image, already in RGB and already reshaping it to (224, 224) image.load_img()加载已经在RGB中的PIL图像并且已经将其重新整形为( image.load_img()
  2. image.img_to_array() is translating this image into a matrix of shape (224, 224, 3). image.img_to_array()将此图像转换为形状矩阵(224,224,3)。 If you access, x[0, 0, 0] you will get the red component of the first pixels as a number between 0 and 255 如果访问x [0,0,0],您将获得第一个像素的红色分量,作为0到255之间的数字
  3. np.expand_dims(x, axis=0) is adding the first dimension. np.expand_dims(x, axis=0)正在添加第一个维度。 x after is has shape (1, 224, 224, 3) x后有形状(1, 224, 224, 3)
  4. preprocess_input is doing an extra preprocessing required for imagenet-trained architectures. preprocess_input正在对经过imagenet培训的架构进行额外的预处理。 From its docstring (run help(preprocess_input) ) you can see that it: 从它的docstring(运行help(preprocess_input) )你可以看到它:

    will convert the images from RGB to BGR, then will zero-center each color channel with respect to the ImageNet dataset, without scaling 将图像从RGB转换为BGR,然后将相对于ImageNet数据集的每个颜色通道置零,不进行缩放

This seems to be the standard input for ImageNet training set. 这似乎是ImageNet训练集的标准输入。

That's it for the preprocessing, now you can just input the image in the pretrained model and get a prediction 这就是预处理,现在您可以在预训练模型中输入图像并获得预测

Predict 预测

y_hat = base_model.predict(x)
print(y_hat.shape) # res.shape (1, 1000)

y_hat contains the probabilities for each of the 1000 imagenet classes the model assigned to this image. y_hat包含模型分配给该图像的1000个imagenet类中的每一个的概率。

In order to obtain the class names and a readable output, keras provided an utility function too: 为了获得类名和可读输出,keras也提供了一个实用功能:

from keras.applications.vgg16 import decode_predictions
decode_predictions(y_hat)

Outputs, for the Zoorashia_elephant.jpg image I downloaded before: 输出,我之前下载的Zoorashia_elephant.jpg图像:

[[('n02504013', 'Indian_elephant', 0.48041093),
  ('n02504458', 'African_elephant', 0.47474155),
  ('n01871265', 'tusker', 0.03912963),
  ('n02437312', 'Arabian_camel', 0.0038948185),
  ('n01704323', 'triceratops', 0.00062475674)]]

Which seems pretty good! 这似乎很不错!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM