简体   繁体   English

如何在 PyTorch 数据加载器中将 RGB 图像转换为灰度?

[英]How to convert RGB images to grayscale in PyTorch dataloader?

I've downloaded some sample images from the MNIST dataset in .jpg format.我已经从.jpg格式的 MNIST 数据集中下载了一些示例图像。 Now I'm loading those images for testing my pre-trained model.现在我正在加载这些图像以测试我的预训练模型。

# transforms to apply to the data
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])

# MNIST dataset
test_dataset = dataset.ImageFolder(root=DATA_PATH, transform=trans)

# Data loader
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

Here DATA_PATH contains a subfolder with the sample image.这里DATA_PATH包含一个带有示例图像的子文件夹。

Here's my network definition这是我的网络定义

# Convolutional neural network (two convolutional layers)
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.network2D = nn.Sequential(
           nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
           nn.ReLU(),
           nn.MaxPool2d(kernel_size=2, stride=2),
           nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
           nn.ReLU(),
           nn.MaxPool2d(kernel_size=2, stride=2))
        self.network1D = nn.Sequential(
           nn.Dropout(),
           nn.Linear(7 * 7 * 64, 1000),
           nn.Linear(1000, 10))

    def forward(self, x):
        out = self.network2D(x)
        out = out.reshape(out.size(0), -1)
        out = self.network1D(out)
        return out

And this is my inference part这是我的推理部分

# Test the model
model = torch.load("mnist_weights_5.pth.tar")
model.eval()

for images, labels in test_loader:
   outputs = model(images.cuda())

When I run this code, I get the following error:当我运行此代码时,出现以下错误:

RuntimeError: Given groups=1, weight of size [32, 1, 5, 5], expected input[1, 3, 28, 28] to have 1 channels, but got 3 channels instead

I understand that the images are getting loaded as 3 channels (RGB).我知道图像被加载为 3 个通道 (RGB)。 So how do I convert them to single channel in the dataloader ?那么如何在数据dataloader它们转换为单通道呢?

Update: I changed transforms to include Grayscale option更新:我更改了transforms以包含Grayscale选项

trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), transforms.Grayscale(num_output_channels=1)])

But now I get this error但现在我收到这个错误

TypeError: img should be PIL Image. Got <class 'torch.Tensor'>

When using ImageFolder class and with no custom loader, pytorch uses PIL to load image and converts it to RGB.当使用ImageFolder类并且没有自定义加载器时,pytorch 使用 PIL 加载图像并将其转换为 RGB。 Default Loader if torchvision image backend is PIL:如果 torchvision 图像后端是 PIL,则默认加载器:

def pil_loader(path):
    with open(path, 'rb') as f:
        img = Image.open(f)
        return img.convert('RGB')

You can use torchvision's Grayscale function in transforms.您可以在转换中使用torchvision 的灰度功能。 It will convert the 3 channel RGB image into 1 channel grayscale.它将 3 通道 RGB 图像转换为 1 通道灰度。 Find out more about this at here此处了解更多信息

A sample code is below,示例代码如下,

import torchvision as tv
import numpy as np
import torch.utils.data as data
dataDir         = 'D:\\general\\ML_DL\\datasets\\CIFAR'
trainTransform  = tv.transforms.Compose([tv.transforms.Grayscale(num_output_channels=1),
                                    tv.transforms.ToTensor(), 
                                    tv.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainSet        = tv.datasets.CIFAR10(dataDir, train=True, download=False, transform=trainTransform)
dataloader      = data.DataLoader(trainSet, batch_size=1, shuffle=False, num_workers=0)
images, labels  = iter(dataloader).next()
print (images.size())

You may implement Dataloader not from ImageFolder, but from Datagenerator, directly load images in __getitem__ function.您可以不从 ImageFolder 实现 Dataloader,而是从 Datagenerator 实现,直接在__getitem__函数中加载图像。 PIL.Image.open("..") then grayscale, to numpy and to Tensor. PIL.Image.open("..") 然后灰度,到 numpy 和张量。

Another option is to calculate greyscale(Y) channel from RGB by formula Y = 0.299 R + 0.587 G + 0.114 B. Slice array and convert to one channel.另一种选择是通过公式Y = 0.299 R + 0.587 G + 0.114 B.从 RGB 计算灰度(Y)通道Y = 0.299 R + 0.587 G + 0.114 B.切片数组并转换为一个通道。

But how do you train your model?但是你如何训练你的模型呢? usually train and test data loads in same way.通常以相同的方式训练和测试数据负载。

I found an extremely simple solution to this problem.我找到了一个非常简单的解决这个问题的方法。 The required dimensions of the tensor are [1,1,28,28] whereas the input tensor is of the form [1,3,28,28] .张量的所需维度为[1,1,28,28]而输入张量的形式为[1,3,28,28] So I need to read just 1 channel from it所以我只需要从中读取 1 个频道

images = images[:,0,:,:]

This gives me a tensor of the form [1,28,28] .这给了我一个[1,28,28]形式的张量。 Now I need to convert this to a tensor of the form [1,1,28,28] .现在我需要将其转换为[1,1,28,28]形式的张量。 Which can be done like this可以这样做

images = images.unsqueeze(0)

So putting the above two lines together, the prediction part of the code can be written like this所以把上面两行放在一起,代码的预测部分可以这样写

for images, labels in test_loader:
   images = images[:,0,:,:].unsqueeze(0) ## Extract single channel and reshape the tensor
   outputs = model(images.cuda())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM