如何使用带有灰度图像的预训练神经网络？

Question

I have a dataset containing grayscale images and I want to train a state-of-the-art CNN on them.我有一个包含灰度图像的数据集，我想在它们上训练一个最先进的 CNN。 I'd very much like to fine-tune a pre-trained model (like the ones here ).我非常想微调预训练的 model（就像这里的那些）。

The problem is that almost all models I can find the weights for have been trained on the ImageNet dataset, which contains RGB images.问题是我能找到权重的几乎所有模型都在包含 RGB 图像的 ImageNet 数据集上进行了训练。

I can't use one of those models because their input layer expects a batch of shape (batch_size, height, width, 3) or (64, 224, 224, 3) in my case, but my images batches are (64, 224, 224) .我不能使用其中一个模型，因为在我的情况下，它们的输入层需要一批形状(batch_size, height, width, 3)或(64, 224, 224, 3) ，但我的图像批次是(64, 224, 224) 。

Is there any way that I can use one of those models?有什么方法可以使用其中一种模型吗？ I've thought of dropping the input layer after I've loaded the weights and adding my own (like we do for the top layers).在加载权重并添加自己的权重后，我曾考虑删除输入层（就像我们对顶层所做的那样）。 Is this approach correct?这种方法正确吗？

Answer 1

The model's architecture cannot be changed because the weights have been trained for a specific input configuration.模型的架构无法更改，因为权重已针对特定输入配置进行了训练。 Replacing the first layer with your own would pretty much render the rest of the weights useless.用你自己的替换第一层几乎会使其余的权重无用。

-- Edit: elaboration suggested by Prune-- -- 编辑：Prune 建议的详细说明 --
CNNs are built so that as they go deeper, they can extract high-level features derived from the lower-level features that the previous layers extracted. CNN 的构建是为了随着它们的深入，它们可以从前一层提取的低级特征中提取高级特征。 By removing the initial layers of a CNN, you are destroying that hierarchy of features because the subsequent layers won't receive the features that they are supposed to as their input.通过删除 CNN 的初始层，您正在破坏该特征层次结构，因为后续层将不会接收它们应该作为输入的特征。 In your case the second layer has been trained to expect the features of the first layer.在您的情况下，第二层已被训练以期望第一层的特征。 By replacing your first layer with random weights, you are essentially throwing away any training that has been done on the subsequent layers, as they would need to be retrained.通过用随机权重替换你的第一层，你基本上放弃了在后续层上完成的任何训练，因为它们需要重新训练。 I doubt that they could retain any of the knowledge learned during the initial training.我怀疑他们能否保留在初始培训中学到的任何知识。
--- end edit --- --- 结束编辑 ---

There is an easy way, though, which you can make your model work with grayscale images.不过，有一种简单的方法可以让您的模型处理灰度图像。 You just need to make the image to appear to be RGB.您只需要使图像看起来是 RGB。 The easiest way to do so is to repeat the image array 3 times on a new dimension.最简单的方法是在新维度上重复图像数组 3 次。 Because you will have the same image over all 3 channels, the performance of the model should be the same as it was on RGB images.因为您将在所有 3 个通道上拥有相同的图像，所以模型的性能应该与它在 RGB 图像上的性能相同。

In numpy this can be easily done like this:在numpy中，这可以像这样轻松完成：

print(grayscale_batch.shape)  # (64, 224, 224)
rgb_batch = np.repeat(grayscale_batch[..., np.newaxis], 3, -1)
print(rgb_batch.shape)  # (64, 224, 224, 3)

The way this works is that it first creates a new dimension (to place the channels) and then it repeats the existing array 3 times on this new dimension.它的工作方式是它首先创建一个新维度（放置通道），然后在这个新维度上重复现有数组 3 次。

I'm also pretty sure that keras' ImageDataGenerator can load grayscale images as RGB.我也很确定 keras 的ImageDataGenerator可以将灰度图像加载为 RGB。

Answer 2

Converting grayscale images to RGB as per the currently accepted answer is one approach to this problem, but not the most efficient.根据当前接受的答案将灰度图像转换为 RGB 是解决此问题的一种方法，但不是最有效的方法。 You most certainly can modify the weights of the model's first convolutional layer and achieve the stated goal.您当然可以修改模型的第一个卷积层的权重并实现既定目标。 The modified model will both work out of the box (with reduced accuracy) and be finetunable.修改后的模型既可以开箱即用（精度降低），也可以微调。 Modifying the weights of the first layer does not render the rest of the weights useless as suggested by others.修改第一层的权重不会像其他人建议的那样使其余的权重无用。

To do this, you'll have to add some code where the pretrained weights are loaded.为此，您必须在加载预训练权重的位置添加一些代码。 In your framework of choice, you need to figure out how to grab the weights of the first convolutional layer in your network and modify them before assigning to your 1-channel model.在您选择的框架中，您需要弄清楚如何获取网络中第一个卷积层的权重并在分配给您的单通道模型之前对其进行修改。 The required modification is to sum the weight tensor over the dimension of the input channels.所需的修改是在输入通道的维度上对权重张量求和。 The way the weights tensor is organized varies from framework to framework.权重张量的组织方式因框架而异。 The PyTorch default is [out_channels, in_channels, kernel_height, kernel_width]. PyTorch 默认为 [out_channels, in_channels, kernel_height, kernel_width]。 In Tensorflow I believe it is [kernel_height, kernel_width, in_channels, out_channels].在 Tensorflow 中，我相信它是 [kernel_height, kernel_width, in_channels, out_channels]。

Using PyTorch as an example, in a ResNet50 model from Torchvision ( https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py ), the shape of the weights for conv1 is [64, 3, 7, 7].以 PyTorch 为例，在 Torchvision ( https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py ) 的 ResNet50 模型中，conv1 的权重形状为 [64, 3 , 7, 7]。 Summing over dimension 1 results in a tensor of shape [64, 1, 7, 7].对维度 1 求和会产生一个形状为 [64, 1, 7, 7] 的张量。 At the bottom I've included a snippet of code that would work with the ResNet models in Torchvision assuming that an argument (inchans) was added to specify a different number of input channels for the model.在底部，我包含了一段代码，它可以与 Torchvision 中的 ResNet 模型一起使用，假设添加了一个参数 (inchans) 来为模型指定不同数量的输入通道。

To prove this works I did three runs of ImageNet validation on ResNet50 with pretrained weights.为了证明这项工作，我在 ResNet50 上使用预训练的权重进行了三轮 ImageNet 验证。 There is a slight difference in the numbers for run 2 & 3, but it's minimal and should be irrelevant once finetuned.运行 2 和 3 的数字略有不同，但它是最小的，一旦微调应该是无关紧要的。

Unmodified ResNet50 w/ RGB Images : Prec @1: 75.6, Prec @5: 92.8未修改的带 RGB 图像的 ResNet50：Prec @1：75.6，Prec @5：92.8
Unmodified ResNet50 w/ 3-chan Grayscale Images: Prec @1: 64.6, Prec @5: 86.4未修改的带 3 通道灰度图像的 ResNet50：Prec @1：64.6，Prec @5：86.4
Modified 1-chan ResNet50 w/ 1-chan Grayscale Images: Prec @1: 63.8, Prec @5: 86.1修改后的 1 通道 ResNet50，带 1 通道灰度图像：Prec @1：63.8，Prec @5：86.1

def _load_pretrained(model, url, inchans=3):
    state_dict = model_zoo.load_url(url)
    if inchans == 1:
        conv1_weight = state_dict['conv1.weight']
        state_dict['conv1.weight'] = conv1_weight.sum(dim=1, keepdim=True)
    elif inchans != 3:
        assert False, "Invalid number of inchans for pretrained weights"
    model.load_state_dict(state_dict)

def resnet50(pretrained=False, inchans=3):
    """Constructs a ResNet-50 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], inchans=inchans)
    if pretrained:
        _load_pretrained(model, model_urls['resnet50'], inchans=inchans)
    return model

Answer 3

A simple way to do this is to add a convolution layer before the base model and then feed the output to the base model.一个简单的方法是在基础模型之前添加一个卷积层，然后将输出提供给基础模型。 Like this:像这样：

from keras.models import Model
from keras.layers import Input 

resnet = Resnet50(weights='imagenet',include_top= 'TRUE') 

input_tensor = Input(shape=(IMG_SIZE,IMG_SIZE,1) )
x = Conv2D(3,(3,3),padding='same')(input_tensor)    # x has a dimension of (IMG_SIZE,IMG_SIZE,3)
out = resnet (x) 

model = Model(inputs=input_tensor,outputs=out)

Answer 4

why not try to convert a grayscale image to a RGB image?为什么不尝试将灰度图像转换为 RGB 图像？

tf.image.grayscale_to_rgb(
    images,
    name=None
)

Answer 5

Dropping the input layer will not work out.删除输入层是行不通的。 This will cause that the all following layers will suffer.这将导致所有后续层都会受到影响。

What you can do is Concatenate 3 black and white images together to expand your color dimension.您可以做的是将 3 个黑白图像连接在一起以扩展您的颜色维度。

img_input = tf.keras.layers.Input(shape=(img_size_target, img_size_target,1))
img_conc = tf.keras.layers.Concatenate()([img_input, img_input, img_input])    

model = ResNet50(include_top=True, weights='imagenet', input_tensor=img_conc)

Answer 6

I faced the same problem while working with VGG16 along with gray-scale images.我在使用 VGG16 和灰度图像时遇到了同样的问题。 I solved this problem like follows:我解决了这个问题，如下所示：

Let's say our training images are in train_gray_images , each row containing the unrolled gray scale image intensities.假设我们的训练图像在train_gray_images中，每一行包含展开的灰度图像强度。 So if we directly pass it to fit function it will create an error as the fit function is expecting a 3 channel (RGB) image data-set instead of gray-scale data set.因此，如果我们直接将其传递给 fit 函数，则会产生错误，因为 fit 函数需要 3 通道(RGB)图像数据集而不是灰度数据集。 So before passing to fit function do the following:因此，在传递给 fit 函数之前，请执行以下操作：

Create a dummy RGB image data set just like the gray scale data set with the same shape (here dummy_RGB_image ).创建一个虚拟RGB图像数据集，就像具有相同形状的灰度数据集（此处dummy_RGB_image ）。 The only difference is here we are using the number of the channel is 3.唯一的区别是这里我们使用的通道数是 3。

dummy_RGB_images = np.ndarray(shape=(train_gray_images.shape[0], train_gray_images.shape[1], train_gray_images.shape[2], 3), dtype= np.uint8)

Therefore just copy the whole data-set 3 times to each of the channels of the "dummy_RGB_images".因此只需将整个数据集复制 3 次到“dummy_RGB_images”的每个通道。 (Here the dimensions are [no_of_examples, height, width, channel] ) （这里的尺寸是[no_of_examples, height, width, channel] ）

dummy_RGB_images[:, :, :, 0] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 1] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 2] = train_gray_images[:, :, :, 0]

Finally pass the dummy_RGB_images instead of the gray scale data-set, like:最后传递dummy_RGB_images而不是灰度数据集，例如：

model.fit(dummy_RGB_images,...)

Answer 7

numpy 的深度堆栈函数np.dstack ((img, img, img)) 是一种自然的方式。

Answer 8

If you're already using scikit-image , you can get the desired result by using gray2RGB.如果您已经在使用scikit-image ，则可以使用 gray2RGB 获得所需的结果。

from skimage.color import gray2rgb
rgb_img = gray2rgb(gray_img)

Answer 9

I believe you can use a pretrained resnet with 1 channel gray scale images without repeating 3 times the image.我相信您可以使用带有 1 通道灰度图像的预训练 resnet，而无需重复 3 次图像。

What I have done is to replace the first layer (this is pythorch not keras, but the idea might be similar):我所做的是替换第一层（这是pythorch而不是keras，但想法可能相似）：

(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

With the following layer:使用以下图层：

(conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

And then copy the sum (in the channel axis) of the weights to the new layer, for example, the shape of the original weights was:然后将权重的总和（在通道轴上）复制到新层，例如，原始权重的形状为：

torch.Size([64, 3, 7, 7])

So I did:所以我做了：

resnet18.conv1.weight.data = resnet18.conv1.weight.data.sum(axis=1).reshape(64, 1, 7, 7)

And then check that the output of the new model is the same than the output with the gray scale image:然后检查新模型的输出是否与灰度图像的输出相同：

y_1 = model_resnet_1(input_image_1)
y_3 = model_resnet_3(input_image_3)
print(torch.abs(y_1).sum(), torch.abs(y_3).sum())
(tensor(710.8860, grad_fn=<SumBackward0>),
 tensor(710.8861, grad_fn=<SumBackward0>))

input_image_1: one channel image input_image_1：一个通道图像

input_image_3: 3 channel image (gray scale - all channels equal) input_image_3：3通道图像（灰度-所有通道相等）

model_resnet_1: modified model model_resnet_1：修改后的模型

model_resnet_3: Original resnet model model_resnet_3：原始 resnet 模型

Answer 10

It's really easy !这真的很容易！ example for 'resnet50': before do it you should have : 'resnet50' 的例子：在做之前你应该有：

resnet_50= torchvision.models.resnet50()     
print(resnet_50.conv1)

Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

Just do this !就这样做吧！

resnet_50.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

the final step is to update state_dict.最后一步是更新 state_dict。

resnet_50.state_dict()['conv1.weight'] = resnet_50.state_dict()['conv1.weight'].sum(dim=1, keepdim=True)

so if run as follow :所以如果运行如下：

print(resnet_50.conv1)

results would be :结果将是：

Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

As you see input channel is for the grayscale images.如您所见，输入通道用于灰度图像。

Answer 11

what I did is to just simply expand grayscales into RGB images by using the following transform stage:我所做的只是通过使用以下变换阶段将灰度扩展为 RGB 图像：

import torchvision as tv
tv.transforms.Compose([
    tv.transforms.ToTensor(),
    tv.transforms.Lambda(lambda x: x.broadcast_to(3, x.shape[1], x.shape[2])),
])

Answer 12

您可以使用 OpenCV 将灰度转换为 RGB。

cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)

Answer 13

When you add the Resnet to model, you should input the input_shape in Resnet definition like将 Resnet 添加到模型时，应在 Resnet 定义中输入 input_shape，例如

 model = ResNet50(include_top=True,input_shape=(256,256,1))

. .

如何使用带有灰度图像的预训练神经网络？

问题描述

12 个解决方案

解决方案1
74 已采纳 2018-08-24 00:43:55

解决方案2
47 2019-02-20 01:13:10

解决方案3
13 2020-01-29 01:21:48

解决方案4
6 2018-10-13 11:54:31

解决方案5
3 2020-07-05 09:53:14

解决方案6
2 2020-03-15 07:39:08

解决方案7
1 2020-03-23 20:01:50

解决方案8
0 2020-03-21 02:37:46

解决方案9
0 2020-07-19 06:05:37

解决方案10
0 2022-06-01 15:22:04

解决方案11
0 2022-07-15 08:10:28

解决方案12
-2 2020-10-19 09:55:43

解决方案13
-10 2018-11-14 07:11:19

如何使用带有灰度图像的预训练神经网络？

问题描述

12 个解决方案

解决方案1 74 已采纳 2018-08-24 00:43:55

解决方案2 47 2019-02-20 01:13:10

解决方案3 13 2020-01-29 01:21:48

解决方案4 6 2018-10-13 11:54:31

解决方案5 3 2020-07-05 09:53:14

解决方案6 2 2020-03-15 07:39:08

解决方案7 1 2020-03-23 20:01:50

解决方案8 0 2020-03-21 02:37:46

解决方案9 0 2020-07-19 06:05:37

解决方案10 0 2022-06-01 15:22:04

解决方案11 0 2022-07-15 08:10:28

解决方案12 -2 2020-10-19 09:55:43

解决方案13 -10 2018-11-14 07:11:19

解决方案1
74 已采纳 2018-08-24 00:43:55

解决方案2
47 2019-02-20 01:13:10

解决方案3
13 2020-01-29 01:21:48

解决方案4
6 2018-10-13 11:54:31

解决方案5
3 2020-07-05 09:53:14

解决方案6
2 2020-03-15 07:39:08

解决方案7
1 2020-03-23 20:01:50

解决方案8
0 2020-03-21 02:37:46

解决方案9
0 2020-07-19 06:05:37

解决方案10
0 2022-06-01 15:22:04

解决方案11
0 2022-07-15 08:10:28

解决方案12
-2 2020-10-19 09:55:43

解决方案13
-10 2018-11-14 07:11:19