如何从 Pytorch 中的单个图像中提取特征向量？

Question

I am attempting to understand more about computer vision models, and I'm trying to do some exploring of how they work.我正在尝试更多地了解计算机视觉模型，并且我正在尝试对它们的工作方式进行一些探索。 In an attempt to understand how to interpret feature vectors more I'm trying to use Pytorch to extract a feature vector.为了理解如何更多地解释特征向量，我尝试使用 Pytorch 来提取特征向量。 Below is my code that I've pieced together from various places.下面是我从不同地方拼凑起来的代码。

import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
from PIL import Image



img=Image.open("Documents/01235.png")

# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),
        torchvision.transforms.CenterCrop(224),
        torchvision.transforms.ToTensor(),
        torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    
def get_vector(image_name):
    # Load the image with Pillow library
    img = Image.open("Documents/Documents/Driven Data Competitions/Hateful Memes Identification/data/01235.png")
    # Create a PyTorch Variable with the transformed image
    t_img = transforms(img)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)
    # Define a function that will copy the output of a layer
    def copy_data(m, i, o):
        my_embedding.copy_(o.data)
    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    model(t_img)
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding

pic_vector = get_vector(img)

When I do this I get the following error:当我这样做时，我收到以下错误：

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 224, 224] instead

I'm sure this is an elementary error, but I can't seem to figure out how to fix this.我确定这是一个基本错误，但我似乎无法弄清楚如何解决这个问题。 It was my impression that the "totensor" transformation would make my data 4-d, but it seems it's either not working correctly or I'm misunderstanding it.我的印象是“totensor”转换会使我的数据变成 4-d，但它似乎无法正常工作，或者我误解了它。 Appreciate any help or resources I can use to learn more about this!感谢我可以用来了解更多信息的任何帮助或资源！

Answer 1

All the default nn.Modules in pytorch expect an additional batch dimension. nn.Modules中的所有默认nn.Modules都需要一个额外的批处理维度。 If the input to a module is shape (B, ...) then the output will be (B, ...) as well (though the later dimensions may change depending on the layer).如果模块的输入是形状 (B, ...)，那么输出也将是 (B, ...)（尽管后面的维度可能会因层而异）。 This behavior allows efficient inference on batches of B inputs simultaneously.这种行为允许同时对 B 输入的批次进行有效推理。 To make your code conform you can just unsqueeze an additional unitary dimension onto the front of t_img tensor before sending it into your model to make it a (1, ...) tensor.为了使您的代码符合你可以unsqueeze额外的单一维度上的前t_img发送到你的模型，使之成为（1，...）张前张量。 You will also need to flatten the output of layer before storing it if you want to copy it into your one-dimensional my_embedding tensor.如果要将layer的输出复制到一维my_embedding张量中，还需要在存储之前将其flatten 。

A couple of other things:其他几件事：

You should infer within a torch.no_grad() context to avoid computing gradients since you won't be needing them (note that model.eval() just changes the behavior of certain layers like dropout and batch normalization, it doesn't disable construction of the computation graph, but torch.no_grad() does).您应该在torch.no_grad()上下文中进行推断以避免计算梯度，因为您不需要它们（请注意， model.eval()只是更改某些层的行为，例如model.eval()和批量标准化，它不会禁用构造计算图，但torch.no_grad()确实如此）。
I assume this is just a copy paste issue but transforms is the name of an imported module as well as a global variable.我认为这只是一个复制粘贴问题，但transforms是导入模块的名称以及全局变量。
o.data is just returning a copy of o . o.data只是返回o的副本。 In the old Variable interface (circa PyTorch 0.3.1 and earlier) this used to be necessary, but the Variable interface was deprecated way back in PyTorch 0.4.0 and no longer does anything useful;在旧的Variable接口（大约 PyTorch 0.3.1 及更早版本）中，这曾经是必要的，但Variable接口在 PyTorch 0.4.0 中已被弃用，不再做任何有用的事情； now its use just creates confusion.现在它的使用只会造成混乱。 Unfortunately, many tutorials are still being written using this old and unnecessary interface.不幸的是，许多教程仍在使用这个旧的和不必要的界面编写。

Updated code is then as follows:更新后的代码如下：

import torch
import torchvision
import torchvision.models as models
from PIL import Image

img = Image.open("Documents/01235.png")

# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),
    torchvision.transforms.CenterCrop(224),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])


def get_vector(image):
    # Create a PyTorch tensor with the transformed image
    t_img = transforms(image)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)

    # Define a function that will copy the output of a layer
    def copy_data(m, i, o):
        my_embedding.copy_(o.flatten())                 # <-- flatten

    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    with torch.no_grad():                               # <-- no_grad context
        model(t_img.unsqueeze(0))                       # <-- unsqueeze
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding


pic_vector = get_vector(img)

Answer 2

model(t_img) Instead of this model(t_img)而不是这个

Here just do--在这里做——

model(t_img[None])

This will add an extra dimension, hence the image will be of shape [1,3,224,224] and it will work.这将增加一个额外的维度，因此图像的形状为[1,3,224,224]并且它会起作用。

如何从 Pytorch 中的单个图像中提取特征向量？

问题描述

2 个解决方案

解决方案1
6 已采纳 2020-08-23 21:43:20

解决方案2
0 2021-08-22 08:50:23

如何从 Pytorch 中的单个图像中提取特征向量？

问题描述

2 个解决方案

解决方案1 6 已采纳 2020-08-23 21:43:20

解决方案2 0 2021-08-22 08:50:23

解决方案1
6 已采纳 2020-08-23 21:43:20

解决方案2
0 2021-08-22 08:50:23