简体   繁体   English

这是在python中美白图像的正确方法吗?

[英]Is this the correct way of whitening an image in python?

I am trying to zero-center and whiten CIFAR10 dataset, but the result I get looks like random noise! 我试图将CIFAR10数据集zero-centerCIFAR10 whiten ,但我得到的结果看起来像随机噪音!
Cifar10 dataset contains 60,000 color images of size 32x32 . Cifar10数据集包含60,000大小为32x32彩色图像。 The training set contains 50,000 and test set contains 10,000 images respectively. 训练集包含50,000 ,测试集分别包含10,000图像。
The following snippets of code show the process I did to get the dataset whitened : 以下代码片段显示了我为使数据集变白而执行的过程:

# zero-center
mean = np.mean(data_train, axis = (0,2,3)) 
for i in range(data_train.shape[0]):
    for j in range(data_train.shape[1]):
        data_train[i,j,:,:] -= mean[j]

first_dim = data_train.shape[0] #50,000
second_dim = data_train.shape[1] * data_train.shape[2] * data_train.shape[3] # 3*32*32
shape = (first_dim, second_dim) # (50000, 3072) 

# compute the covariance matrix
cov = np.dot(data_train.reshape(shape).T, data_train.reshape(shape)) / data_train.shape[0] 
# compute the SVD factorization of the data covariance matrix
U,S,V = np.linalg.svd(cov)

print 'cov.shape = ',cov.shape
print U.shape, S.shape, V.shape

Xrot = np.dot(data_train.reshape(shape), U) # decorrelate the data
Xwhite = Xrot / np.sqrt(S + 1e-5)

print Xwhite.shape
data_whitened = Xwhite.reshape(-1,32,32,3)
print data_whitened.shape

outputs: 输出:

cov.shape =  (3072L, 3072L)
(3072L, 3072L) (3072L,) (3072L, 3072L)
(50000L, 3072L)
(50000L, 32L, 32L, 3L)
(32L, 32L, 3L)

and trying to show the resulting image : 并尝试显示生成的图像:

import matplotlib.pyplot as plt
%matplotlib inline
from scipy.misc import imshow
print data_whitened[0].shape
fig = plt.figure()
plt.subplot(221)
plt.imshow(data_whitened[0])
plt.subplot(222)
plt.imshow(data_whitened[100])
plt.show()

在此输入图像描述

By the way the data_train[0].shape is (3,32,32) , but if I reshape the whittened image according to that I get 顺便说一下data_train[0].shape(3,32,32) ,但如果我按照我得到的那样重塑了白化的图像

TypeError: Invalid dimensions for image data

Could this be a visualization issue only? 这可能只是一个可视化问题吗? if so how can I make sure thats the case? 如果是这样我怎么能确定这种情况呢?

Update : 更新:
Thanks to @AndrasDeak, I fixed the visualization code this way, but still the output looks random : 感谢@AndrasDeak,我以这种方式修复了可视化代码,但输出仍然是随机的:

data_whitened = Xwhite.reshape(-1,3,32,32).transpose(0,2,3,1)
print data_whitened.shape
fig = plt.figure()
plt.subplot(221)
plt.imshow(data_whitened[0])

在此输入图像描述

Update 2: 更新2:
This is what I get when I run some of the commands given below : As it can be seen below, toimage can show the image just fine, but trying to reshape it, messes up the image. 这是我在运行下面给出的一些命令时得到的结果:正如下面所示,toimage可以很好地显示图像,但是试图重塑它,会弄乱图像。 在此输入图像描述

# output is of shape (N, 3, 32, 32)
X = X.reshape((-1,3,32,32))
# output is of shape (N, 32, 32, 3)
X = X.transpose(0,2,3,1)
# put data back into a design matrix (N, 3072)
X = X.reshape(-1, 3072)

plt.imshow(X[6].reshape(32,32,3))
plt.show()

在此输入图像描述

for some wierd reason, this was what I got at first , but then after several tries, it changed to the previous image. 出于一些奇怪的原因,这是我最初得到的,但经过几次尝试后,它改为上一张图片。 在此输入图像描述

Let's walk through this. 让我们来看看这个。 As you point out, CIFAR contains images which are stored in a matrix; 正如您所指出的,CIFAR包含存储在矩阵中的图像; each image is a row, and each row has 3072 columns of uint8 numbers (0-255). 每个图像都是一行,每行有3072列uint8编号(0-255)。 Images are 32x32 pixels and pixels are RGB (three channel colour). 图像为32x32像素,像素为RGB(三通道颜色)。

# https://www.cs.toronto.edu/~kriz/cifar.html
# wget https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
# tar xf cifar-10-python.tar.gz
import numpy as np
import cPickle
with open('cifar-10-batches-py/data_batch_1') as input_file: 
    X = cPickle.load(input_file)
X = X['data']   # shape is (N, 3072)

It turns out that the columns are ordered a bit funny: all the red pixel values come first, then all the green pixels, then all the blue pixels. 事实证明,列的排序有点滑稽:所有红色像素值首先出现,然后是所有绿色像素,然后是所有蓝色像素。 This makes it tricky to have a look at the images. 这使得查看图像变得棘手。 This: 这个:

import matplotlib.pyplot as plt
plt.imshow(X[6].reshape(32,32,3))
plt.show()

gives this: 给出这个:

混色的色彩通道

So, just for ease of viewing, let's shuffle the dimensions of our matrix around with reshape and transpose : 因此,为了便于观看,让我们通过reshapetransposereshape transpose矩阵的尺寸:

# output is of shape (N, 3, 32, 32)
X = X.reshape((-1,3,32,32))
# output is of shape (N, 32, 32, 3)
X = X.transpose(0,2,3,1)
# put data back into a design matrix (N, 3072)
X = X.reshape(-1, 3072)

Now: 现在:

plt.imshow(X[6].reshape(32,32,3))
plt.show()

gives: 得到:

一只孔雀

OK, on to ZCA whitening. 好的,ZCA美白。 We're frequently reminded that it's super important to zero-center the data before whitening it. 我们经常被提醒,在对数据进行白化之前对数据进行零中心是非常重要的。 At this point, an observation about the code you include. 此时,观察您包含的代码。 From what I can tell, computer vision views color channels as just another feature dimension; 据我所知,计算机视觉将色彩通道视为另一个特征维度; there's nothing special about the separate RGB values in an image, just like there's nothing special about the separate pixel values. 图像中的单独RGB值没有什么特别之处,就像单独的像素值没有什么特别之处。 They're all just numeric features. 它们都只是数字功能。 So, whereas you're computing the average pixel value, respecting colour channels (ie, your mean is a tuple of r,g,b values), we'll just compute the average image value. 因此,当您计算平均像素值时,尊重颜色通道(即,您的meanr,g,b值的元组),我们只计算平均图像值。 Note that X is a big matrix with N rows and 3072 columns. 请注意, X是一个包含N行和3072列的大矩阵。 We'll treat every column as being "the same kind of thing" as every other column. 我们将每列都视为与其他列相同的“同类事物”。

# zero-centre the data (this calculates the mean separately across
# pixels and colour channels)
X = X - X.mean(axis=0)

At this point, let's also do Global Contrast Normalization, which is quite often applied to image data. 此时,我们还要进行全局对比度标准化,这通常应用于图像数据。 I'll use the L2 norm, which makes every image have vector magnitude 1: 我将使用L2规范,这使得每个图像都具有矢量幅度1:

X = X / np.sqrt((X ** 2).sum(axis=1))[:,None]

One could easily use something else, like the standard deviation ( X = X / np.std(X, axis=0) ) or min-max scaling to some interval like [-1,1]. 人们可以很容易地使用其他东西,比如标准偏差( X = X / np.std(X, axis=0) )或最小 - 最大缩放到某个区间,如[-1,1]。

Nearly there. 就快到了。 At this point, we haven't greatly modified our data, since we've just shifted and scaled it (a linear transform). 在这一点上,我们没有对数据进行过大的修改,因为我们只是对它进行了移位和缩放(线性变换)。 To display it, we need to get image data back into the range [0,1], so let's use a helper function: 要显示它,我们需要将图像数据恢复到范围[0,1],所以让我们使用辅助函数:

def show(i):
    i = i.reshape((32,32,3))
    m,M = i.min(), i.max()
    plt.imshow((i - m) / (M - m))
    plt.show()

show(X[6])

The peacock looks slightly brighter here, but that's just because we've stretched its pixel values to fill the interval [0,1]: 孔雀在这里看起来稍微亮一点,但这只是因为我们已经拉伸其像素值以填充区间[0,1]:

稍微明亮的孔雀

ZCA whitening: ZCA美白:

# compute the covariance of the image data
cov = np.cov(X, rowvar=True)   # cov is (N, N)
# singular value decomposition
U,S,V = np.linalg.svd(cov)     # U is (N, N), S is (N,)
# build the ZCA matrix
epsilon = 1e-5
zca_matrix = np.dot(U, np.dot(np.diag(1.0/np.sqrt(S + epsilon)), U.T))
# transform the image data       zca_matrix is (N,N)
zca = np.dot(zca_matrix, X)    # zca is (N, 3072)

Taking a look ( show(zca[6]) ): 看一看( show(zca[6]) ):

“白化”孔雀

Now the peacock definitely looks different. 现在孔雀看起来肯定不同了。 You can see that the ZCA has rotated the image through colour space, so it looks like a picture on an old TV with the Tone setting out of whack. 您可以看到ZCA已经通过色彩空间旋转了图像,因此它看起来像旧电视上的图片,而Tone设置不正常。 Still recognisable, though. 但仍然可以识别。

Presumably because of the epsilon value I used, the covariance of my transformed data isn't exactly identity, but it's fairly close: 可能是因为我使用的epsilon值,我转换后的数据的协方差并不完全相同,但它非常接近:

>>> (np.cov(zca, rowvar=True).argmax(axis=1) == np.arange(zca.shape[0])).all()
True

Update 29 January 1月29日更新

I'm not entirely sure how to sort out the issues you're having; 我不完全确定如何解决你所遇到的问题; your trouble seems to lie in the shape of your raw data at the moment, so I would advise you to sort that out first before you try to move on to zero-centring and ZCA. 你的麻烦似乎在于你原始数据的形状,所以我建议你在尝试转向零中心和ZCA之前先对其进行排序。

One the one hand, the first plot of the four plots in your update looks good, suggesting that you've loaded up the CIFAR data in the correct way. 一方面,您的更新中的四个图的第一个图表看起来很好,这表明您已经以正确的方式加载了CIFAR数据。 The second plot is produced by toimage , I think, which will automagically figure out which dimension has the colour data, which is a nice trick. 第二个图是由toimage生成的,我认为,它会自动找出哪个维度有颜色数据,这是一个很好的技巧。 On the other hand, the stuff that comes after that looks weird, so it seems something is going wrong somewhere. 另一方面,之后出现的东西看起来很奇怪,所以似乎某些地方出了问题。 I confess I can't quite follow the state of your script, because I suspect you're working interactively (notebook), retrying things when they don't work (more on this in a second), and that you're using code that you haven't shown in your question. 我承认我不能完全遵循你的脚本状态,因为我怀疑你正在以交互方式工作(笔记本),当他们不工作时重试一些事情(更多关于这一点),并且你正在使用代码您没有在问题中显示。 In particular, I'm not sure how you're loading the CIFAR data; 特别是,我不确定你是如何加载CIFAR数据的; your screenshot shows output from some print statements ( Reading training data... , etc.), and then when you copy train_data into X and print the shape of X , the shape has already been reshaped into (N, 3, 32, 32) . 您的屏幕截图显示了一些print语句的输出( Reading training data...等),然后当您将train_data复制到X并打印X shape时,形状已经被重新整形为(N, 3, 32, 32) train_data (N, 3, 32, 32) Like I say, Update plot 1 would tend to suggest that the reshape has happened correctly. 就像我说的,更新情节1会倾向于表明重塑已经正确发生。 From plots 3 and 4, I think you're getting mixed up about matrix dimensions somewhere, so I'm not sure how you're doing the reshape and transpose. 从图3和图4可以看出,我认为你在某处有关于矩阵尺寸的混淆,所以我不确定你是如何进行重塑和转置的。

Note that it's important to be careful with the reshape and transpose, for the following reason. 请注意,由于以下原因,请务必小心重塑和转置。 The X = X.reshape(...) and X = X.transpose(...) code is modifying the matrix in place . X = X.reshape(...)X = X.transpose(...)代码被修改以代替矩阵。 If you do this multiple times (like by accident in the jupyter notebook), you will shuffle the axes of your matrix over and over, and plotting the data will start to look really weird. 如果你多次这样做(比如jupyter笔记本中的意外),你会一遍又一遍地对矩阵的轴进行洗牌,并且绘制数据看起来会非常奇怪。 This image shows the progression, as we iterate the reshape and transpose operations: 这个图像显示了进展,因为我们迭代重塑和转置操作:

增加重塑和转置的迭代次数

This progression does not cycle back, or at least, it doesn't cycle quickly. 这种进展不会循环,或者至少不会快速循环。 Because of periodic regularities in the data (like the 32-pixel row structure of the images), you tend to get banding in these improperly reshape-transposed images. 由于数据中的周期性规律(如图像的32像素行结构),您倾向于在这些不正确的重塑转换图像中进行条带化。 I'm wondering if that's what's going on in the third of your four plots in your update, which looks a lot less random than the images in the original version of your question. 我想知道你的更新中的四个图中的第三个是否会发生这种情况,这看起来比你问题的原始版本中的图像更随机。

The fourth plot of your update is a colour negative of the peacock. 你的更新的第四个图是孔雀的颜色底片。 I'm not sure how you're getting that, but I can reproduce your output with: 我不确定你是怎么做到的,但我可以用以下方式重现你的输出:

plt.imshow(255 - X[6].reshape(32,32,3))
plt.show()

which gives: 这使:

孔雀的颜色否定

One way you could get this is if you were using my show helper function, and you mixed up m and M , like this: 你可以得到这个的一种方法是,如果你使用我的show helper函数,你混合了mM ,就像这样:

def show(i):
    i = i.reshape((32,32,3))
    m,M = i.min(), i.max()
    plt.imshow((i - M) / (m - M))  # this will produce a negative img
    plt.show()

I had the same issue: the resulting projected values are off: 我遇到了同样的问题:结果投影值已关闭:

A float image is supposed to be in [0-1.0] values for each 浮动图像应该是每个的[0-1.0]值

def toimage(data):
    min_ = np.min(data)
    max_ = np.max(data)
    return (data-min_)/(max_ - min_)

NOTICE: use this function only for visualization! 注意:仅将此功能用于可视化!

However notice how the "decorrelation" or "whitening" matrix is computed @wildwilhelm 但请注意@wildwilhelm如何计算“去相关”或“白化”矩阵

zca_matrix = np.dot(U, np.dot(np.diag(1.0/np.sqrt(S + epsilon)), U.T))

This is because the U matrix of eigen vectors of the correlation matrix it's actually this one: SVD(X) = U,S,V but U is the EigenBase of X*X not of X https://en.wikipedia.org/wiki/Singular-value_decomposition 这是因为相关矩阵的特征向量的U矩阵实际上就是这个:SVD(X)= U,S,V但U是X * X的EigenBase而不是X的https://en.wikipedia.org/维基/奇异value_decomposition

As a final note, I would rather consider statistical units only the pixels and the RGB channels their modalities instead of Images as statistical units and pixels as modalities. 作为最后一点,我宁愿只考虑统计单位的像素和RGB通道的模态而不是图像作为统计单位和像素作为模态。 I've tryed this on the CIFAR 10 database and it works quite nicely. 我在CIFAR 10数据库上尝试了这个,它运行得很好。

IMAGE EXAMPLE: Top image has RGB values "withened", Bottom is the original 图像示例:顶部图像的RGB值为“withened”,Bottom为原始值

此搜索

IMAGE EXAMPLE2: NO ZCA transform performances in train and loss 图像示例2:没有ZCA改变列车和损失的性能

镜像2

IMAGE EXAMPLE3: ZCA transform performances in train and loss 图像示例3:ZCA变换训练和失去的表现

此搜索

If you want to linearly scale the image to have zero mean and unit norm you can do the same image whitening as Tensofrlow's tf.image.per_image_standardization . 如果要将图像线性缩放为零均值和单位范数,则可以像Tensofrlow的tf.image.per_image_standardization一样进行相同的图像白化。 After the documentation you need to use the following formula to normalize each image independently : 在文档之后,您需要使用以下公式来独立规范化每个图像

(image - image_mean) / max(image_stddev, 1.0/sqrt(image_num_elements))

Keep in mind that the mean and the standard deviation should be computed over all values in the image. 请记住,应该计算图像中所有值meanstandard deviation This means that we don't need to specify the axis/axes along which they are computed. 这意味着我们不需要指定计算它们的轴/轴。

The way to implement that without Tensorflow is by using numpy as following: 没有Tensorflow实现它的方法是使用numpy如下:

import math
import numpy as np
from PIL import Image

# open image
image = Image.open("your_image.jpg")
image = np.array(image)

# standardize image
mean = image.mean()
stddev = image.std()
adjusted_stddev = max(stddev, 1.0/math.sqrt(image.size))
standardized_image = (image - mean) / adjusted_stddev

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM