简体   繁体   English

如何使用带有灰度图像的预训练神经网络?

[英]How can I use a pre-trained neural network with grayscale images?

I have a dataset containing grayscale images and I want to train a state-of-the-art CNN on them.我有一个包含灰度图像的数据集,我想在它们上训练一个最先进的 CNN。 I'd very much like to fine-tune a pre-trained model (like the ones here ).我非常想微调预训练的 model(就像这里的那些)。

The problem is that almost all models I can find the weights for have been trained on the ImageNet dataset, which contains RGB images.问题是我能找到权重的几乎所有模型都在包含 RGB 图像的 ImageNet 数据集上进行了训练。

I can't use one of those models because their input layer expects a batch of shape (batch_size, height, width, 3) or (64, 224, 224, 3) in my case, but my images batches are (64, 224, 224) .我不能使用其中一个模型,因为在我的情况下,它们的输入层需要一批形状(batch_size, height, width, 3)(64, 224, 224, 3) ,但我的图像批次是(64, 224, 224)

Is there any way that I can use one of those models?有什么方法可以使用其中一种模型吗? I've thought of dropping the input layer after I've loaded the weights and adding my own (like we do for the top layers).在加载权重并添加自己的权重后,我曾考虑删除输入层(就像我们对顶层所做的那样)。 Is this approach correct?这种方法正确吗?

The model's architecture cannot be changed because the weights have been trained for a specific input configuration.模型的架构无法更改,因为权重已针对特定输入配置进行了训练。 Replacing the first layer with your own would pretty much render the rest of the weights useless.用你自己的替换第一层几乎会使其余的权重无用。

-- Edit: elaboration suggested by Prune-- -- 编辑:Prune 建议的详细说明 --
CNNs are built so that as they go deeper, they can extract high-level features derived from the lower-level features that the previous layers extracted. CNN 的构建是为了随着它们的深入,它们可以从前一层提取的低级特征中提取高级特征。 By removing the initial layers of a CNN, you are destroying that hierarchy of features because the subsequent layers won't receive the features that they are supposed to as their input.通过删除 CNN 的初始层,您正在破坏该特征层次结构,因为后续层将不会接收它们应该作为输入的特征。 In your case the second layer has been trained to expect the features of the first layer.在您的情况下,第二层已被训练以期望第一层的特征。 By replacing your first layer with random weights, you are essentially throwing away any training that has been done on the subsequent layers, as they would need to be retrained.通过用随机权重替换你的第一层,你基本上放弃了在后续层上完成的任何训练,因为它们需要重新训练。 I doubt that they could retain any of the knowledge learned during the initial training.我怀疑他们能否保留在初始培训中学到的任何知识。
--- end edit --- --- 结束编辑 ---

There is an easy way, though, which you can make your model work with grayscale images.不过,有一种简单的方法可以让您的模型处理灰度图像。 You just need to make the image to appear to be RGB.您只需要使图像看起来是 RGB。 The easiest way to do so is to repeat the image array 3 times on a new dimension.最简单的方法是在新维度上重复图像数组 3 次。 Because you will have the same image over all 3 channels, the performance of the model should be the same as it was on RGB images.因为您将在所有 3 个通道上拥有相同的图像,所以模型的性能应该与它在 RGB 图像上的性能相同。

In numpy this can be easily done like this:numpy中,这可以像这样轻松完成:

print(grayscale_batch.shape)  # (64, 224, 224)
rgb_batch = np.repeat(grayscale_batch[..., np.newaxis], 3, -1)
print(rgb_batch.shape)  # (64, 224, 224, 3)

The way this works is that it first creates a new dimension (to place the channels) and then it repeats the existing array 3 times on this new dimension.它的工作方式是它首先创建一个新维度(放置通道),然后在这个新维度上重复现有数组 3 次。

I'm also pretty sure that keras' ImageDataGenerator can load grayscale images as RGB.我也很确定 keras 的ImageDataGenerator可以将灰度图像加载为 RGB。

Converting grayscale images to RGB as per the currently accepted answer is one approach to this problem, but not the most efficient.根据当前接受的答案将灰度图像转换为 RGB 是解决此问题的一种方法,但不是最有效的方法。 You most certainly can modify the weights of the model's first convolutional layer and achieve the stated goal.您当然可以修改模型的第一个卷积层的权重并实现既定目标。 The modified model will both work out of the box (with reduced accuracy) and be finetunable.修改后的模型既可以开箱即用(精度降低),也可以微调。 Modifying the weights of the first layer does not render the rest of the weights useless as suggested by others.修改第一层的权重不会像其他人建议的那样使其余的权重无用。

To do this, you'll have to add some code where the pretrained weights are loaded.为此,您必须在加载预训练权重的位置添加一些代码。 In your framework of choice, you need to figure out how to grab the weights of the first convolutional layer in your network and modify them before assigning to your 1-channel model.在您选择的框架中,您需要弄清楚如何获取网络中第一个卷积层的权重并在分配给您的单通道模型之前对其进行修改。 The required modification is to sum the weight tensor over the dimension of the input channels.所需的修改是在输入通道的维度上对权重张量求和。 The way the weights tensor is organized varies from framework to framework.权重张量的组织方式因框架而异。 The PyTorch default is [out_channels, in_channels, kernel_height, kernel_width]. PyTorch 默认为 [out_channels, in_channels, kernel_height, kernel_width]。 In Tensorflow I believe it is [kernel_height, kernel_width, in_channels, out_channels].在 Tensorflow 中,我相信它是 [kernel_height, kernel_width, in_channels, out_channels]。

Using PyTorch as an example, in a ResNet50 model from Torchvision ( https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py ), the shape of the weights for conv1 is [64, 3, 7, 7].以 PyTorch 为例,在 Torchvision ( https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py ) 的 ResNet50 模型中,conv1 的权重形状为 [64, 3 , 7, 7]。 Summing over dimension 1 results in a tensor of shape [64, 1, 7, 7].对维度 1 求和会产生一个形状为 [64, 1, 7, 7] 的张量。 At the bottom I've included a snippet of code that would work with the ResNet models in Torchvision assuming that an argument (inchans) was added to specify a different number of input channels for the model.在底部,我包含了一段代码,它可以与 Torchvision 中的 ResNet 模型一起使用,假设添加了一个参数 (inchans) 来为模型指定不同数量的输入通道。

To prove this works I did three runs of ImageNet validation on ResNet50 with pretrained weights.为了证明这项工作,我在 ResNet50 上使用预训练的权重进行了三轮 ImageNet 验证。 There is a slight difference in the numbers for run 2 & 3, but it's minimal and should be irrelevant once finetuned.运行 2 和 3 的数字略有不同,但它是最小的,一旦微调应该是无关紧要的。

  1. Unmodified ResNet50 w/ RGB Images : Prec @1: 75.6, Prec @5: 92.8未修改的带 RGB 图像的 ResNet50:Prec @1:75.6,Prec @5:92.8
  2. Unmodified ResNet50 w/ 3-chan Grayscale Images: Prec @1: 64.6, Prec @5: 86.4未修改的带 3 通道灰度图像的 ResNet50:Prec @1:64.6,Prec @5:86.4
  3. Modified 1-chan ResNet50 w/ 1-chan Grayscale Images: Prec @1: 63.8, Prec @5: 86.1修改后的 1 通道 ResNet50,带 1 通道灰度图像:Prec @1:63.8,Prec @5:86.1
def _load_pretrained(model, url, inchans=3):
    state_dict = model_zoo.load_url(url)
    if inchans == 1:
        conv1_weight = state_dict['conv1.weight']
        state_dict['conv1.weight'] = conv1_weight.sum(dim=1, keepdim=True)
    elif inchans != 3:
        assert False, "Invalid number of inchans for pretrained weights"
    model.load_state_dict(state_dict)

def resnet50(pretrained=False, inchans=3):
    """Constructs a ResNet-50 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], inchans=inchans)
    if pretrained:
        _load_pretrained(model, model_urls['resnet50'], inchans=inchans)
    return model

A simple way to do this is to add a convolution layer before the base model and then feed the output to the base model.一个简单的方法是在基础模型之前添加一个卷积层,然后将输出提供给基础模型。 Like this:像这样:

from keras.models import Model
from keras.layers import Input 

resnet = Resnet50(weights='imagenet',include_top= 'TRUE') 

input_tensor = Input(shape=(IMG_SIZE,IMG_SIZE,1) )
x = Conv2D(3,(3,3),padding='same')(input_tensor)    # x has a dimension of (IMG_SIZE,IMG_SIZE,3)
out = resnet (x) 

model = Model(inputs=input_tensor,outputs=out)


why not try to convert a grayscale image to a RGB image?为什么不尝试将灰度图像转换为 RGB 图像?

tf.image.grayscale_to_rgb(
    images,
    name=None
)

Dropping the input layer will not work out.删除输入层是行不通的。 This will cause that the all following layers will suffer.这将导致所有后续层都会受到影响。

What you can do is Concatenate 3 black and white images together to expand your color dimension.您可以做的是将 3 个黑白图像连接在一起以扩展您的颜色维度。

img_input = tf.keras.layers.Input(shape=(img_size_target, img_size_target,1))
img_conc = tf.keras.layers.Concatenate()([img_input, img_input, img_input])    

model = ResNet50(include_top=True, weights='imagenet', input_tensor=img_conc)

I faced the same problem while working with VGG16 along with gray-scale images.我在使用 VGG16 和灰度图像时遇到了同样的问题。 I solved this problem like follows:我解决了这个问题,如下所示:

Let's say our training images are in train_gray_images , each row containing the unrolled gray scale image intensities.假设我们的训练图像在train_gray_images中,每一行包含展开的灰度图像强度。 So if we directly pass it to fit function it will create an error as the fit function is expecting a 3 channel (RGB) image data-set instead of gray-scale data set.因此,如果我们直接将其传递给 fit 函数,则会产生错误,因为 fit 函数需要 3 通道(RGB)图像数据集而不是灰度数据集。 So before passing to fit function do the following:因此,在传递给 fit 函数之前,请执行以下操作:

Create a dummy RGB image data set just like the gray scale data set with the same shape (here dummy_RGB_image ).创建一个虚拟RGB图像数据集,就像具有相同形状的灰度数据集(此处dummy_RGB_image )。 The only difference is here we are using the number of the channel is 3.唯一的区别是这里我们使用的通道数是 3。

dummy_RGB_images = np.ndarray(shape=(train_gray_images.shape[0], train_gray_images.shape[1], train_gray_images.shape[2], 3), dtype= np.uint8) 

Therefore just copy the whole data-set 3 times to each of the channels of the "dummy_RGB_images".因此只需将整个数据集复制 3 次到“dummy_RGB_images”的每个通道。 (Here the dimensions are [no_of_examples, height, width, channel] ) (这里的尺寸是[no_of_examples, height, width, channel]

dummy_RGB_images[:, :, :, 0] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 1] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 2] = train_gray_images[:, :, :, 0]

Finally pass the dummy_RGB_images instead of the gray scale data-set, like:最后传递dummy_RGB_images而不是灰度数据集,例如:

model.fit(dummy_RGB_images,...)

numpy 的深度堆栈函数np.dstack ((img, img, img)) 是一种自然的方式。

If you're already using scikit-image , you can get the desired result by using gray2RGB.如果您已经在使用scikit-image ,则可以使用 gray2RGB 获得所需的结果。

from skimage.color import gray2rgb
rgb_img = gray2rgb(gray_img)

I believe you can use a pretrained resnet with 1 channel gray scale images without repeating 3 times the image.我相信您可以使用带有 1 通道灰度图像的预训练 resnet,而无需重复 3 次图像。

What I have done is to replace the first layer (this is pythorch not keras, but the idea might be similar):我所做的是替换第一层(这是pythorch而不是keras,但想法可能相似):

(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

With the following layer:使用以下图层:

(conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

And then copy the sum (in the channel axis) of the weights to the new layer, for example, the shape of the original weights was:然后将权重的总和(在通道轴上)复制到新层,例如,原始权重的形状为:

torch.Size([64, 3, 7, 7])

So I did:所以我做了:

resnet18.conv1.weight.data = resnet18.conv1.weight.data.sum(axis=1).reshape(64, 1, 7, 7)

And then check that the output of the new model is the same than the output with the gray scale image:然后检查新模型的输出是否与灰度图像的输出相同:

y_1 = model_resnet_1(input_image_1)
y_3 = model_resnet_3(input_image_3)
print(torch.abs(y_1).sum(), torch.abs(y_3).sum())
(tensor(710.8860, grad_fn=<SumBackward0>),
 tensor(710.8861, grad_fn=<SumBackward0>))

input_image_1: one channel image input_image_1:一个通道图像

input_image_3: 3 channel image (gray scale - all channels equal) input_image_3:3通道图像(灰度-所有通道相等)

model_resnet_1: modified model model_resnet_1:修改后的模型

model_resnet_3: Original resnet model model_resnet_3:原始 resnet 模型

It's really easy !这真的很容易! example for 'resnet50': before do it you should have : 'resnet50' 的例子:在做之前你应该有:

resnet_50= torchvision.models.resnet50()     
print(resnet_50.conv1)

Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

Just do this !就这样做吧!

resnet_50.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

the final step is to update state_dict.最后一步是更新 state_dict。

resnet_50.state_dict()['conv1.weight'] = resnet_50.state_dict()['conv1.weight'].sum(dim=1, keepdim=True)

so if run as follow :所以如果运行如下:

print(resnet_50.conv1)

results would be :结果将是:

Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

As you see input channel is for the grayscale images.如您所见,输入通道用于灰度图像。

what I did is to just simply expand grayscales into RGB images by using the following transform stage:我所做的只是通过使用以下变换阶段将灰度扩展为 RGB 图像:

import torchvision as tv
tv.transforms.Compose([
    tv.transforms.ToTensor(),
    tv.transforms.Lambda(lambda x: x.broadcast_to(3, x.shape[1], x.shape[2])),
])

您可以使用 OpenCV 将灰度转换为 RGB。

cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)

When you add the Resnet to model, you should input the input_shape in Resnet definition like将 Resnet 添加到模型时,应在 Resnet 定义中输入 input_shape,例如

 model = ResNet50(include_top=True,input_shape=(256,256,1))

. .

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何构建一个将句子嵌入连接到预训练 CNN 的神经网络 - How to build a Neural Network with sentence embeding concatenated to pre-trained CNN 如何在Keras中加载卷积神经网络前几层的权重并删除预训练的model? - How to load the weights of the first few layers of Convolutional Neural Network in Keras and delete the pre-trained model? 如何使用现有CNN模型中的预训练权重在Keras中进行迁移学习? - How can I use pre-trained weights from an existing CNN model for transfer learning in Keras? 如何使用 OpenVINO 预训练模型? - How to use OpenVINO pre-trained models? 如何在TensorFlow中使用预训练模型 - How to use pre-trained model in TensorFlow 无法使用Tensorflow恢复预先训练的网络 - Can't restore pre-trained network with Tensorflow 如何使用预训练网络对新音频文件进行预测? - How to make predictions on new audio files with pre-trained network? 是否有任何可下载的训练网络“模型”文件可以用于 label 图像,我不知道如何训练神经网络并且没有 GPU? - Is there any downloadable trained network "model" file I can use to label images, I don't know how to train a neural network and don't have a GPU? 无法将大小为 47040000 的数组重塑为预训练神经网络的形状 (60000,32,32,1) - Cannot reshape array of size 47040000 into shape (60000,32,32,1) for pre-trained neural network 具有两个预训练的ResNet 50的暹罗神经网络-测试模型时的奇怪行为 - Siamese neural network with two pre-trained ResNet 50 - strange behavior while testing model
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM