简体   繁体   English

Keras自动编码器

[英]Keras autoencoder

I've worked a long time ago with neural networks in Java and now I'm trying to learn to use TFLearn and Keras in Python. 我很早以前就使用Java中的神经网络工作,现在我正尝试学习在Python中使用TFLearn和Keras。

I'm trying to build an autoencoder, but as I'm experiencing problems the code I show you hasn't got the bottleneck characteristic (this should make the problem even easier). 我正在尝试构建一个自动编码器,但是当我遇到问题时,我向您展示的代码没有瓶颈特性(这会使问题更容易解决)。

On the following code I create the network, the dataset (two random variables), and after train it plots the correlation between each predicted variable with its input. 在以下代码中,我创建了网络,数据集(两个随机变量),并在训练后绘制了每个预测变量与其输入之间的相关性。

What the network should learn, is to output the same input that receives. 网络应该学习的是输出与接收相同的输入。

import matplotlib.pyplot as plt
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras.models import load_model
from loaders.nslKddCup99.nslKddCup99Loader import NslKddCup99

def buildMyNetwork(inputs, bottleNeck):
    inputLayer = Input(shape=(inputs,))
    autoencoder = Dense(inputs*2, activation='relu')(inputLayer)
    autoencoder = Dense(inputs*2, activation='relu')(autoencoder)
    autoencoder = Dense(bottleNeck, activation='relu')(autoencoder)
    autoencoder = Dense(inputs*2, activation='relu')(autoencoder)
    autoencoder = Dense(inputs*2, activation='relu')(autoencoder)
    autoencoder = Dense(inputs, activation='sigmoid')(autoencoder)
    autoencoder = Model(input=inputLayer, output=autoencoder)
    autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')
    return autoencoder


dataSize = 1000
variables = 2
data = np.zeros((dataSize,variables))
data[:, 0] = np.random.uniform(0, 0.8, size=dataSize)
data[:, 1] = np.random.uniform(0, 0.1, size=dataSize)

trainData, testData = data[:900], data[900:]

model = buildMyNetwork(variables,2)
model.fit(trainData, trainData, nb_epoch=2000)
predictions = model.predict(testData)

for x in range(variables):
    plt.scatter(testData[:, x], predictions[:, x])
    plt.show()
    plt.close()

Even though some times the result is acceptable, many others isn't, I know neural networks have weight random initialization and therefore it may converge to different solutions, but I think this is too much and there may be some mistake in my code. 尽管有时结果是可以接受的,但许多其他情况却是不可接受的,我知道神经网络具有权重随机初始化,因此它可能会收敛到不同的解决方案,但是我认为这太多了,并且我的代码中可能存在一些错误。

Sometimes correlation is acceptable 有时相关性是可以接受的

Others is quite lost 其他人很失落

** **

UPDATE: 更新:

** **

Thanks Marcin Możejko! 感谢MarcinMożejko!

Indeed that was the problem, my original question was because I was trying to build an autoencoder, so to be coherent with the title here comes an example of autoencoder (just making a more complex dataset and changing the activation functions): 确实那就是问题所在,我最初的问题是因为我试图构建一个自动编码器,因此为了与标题保持一致,出现了一个自动编码器的示例(只是制作了一个更复杂的数据集并更改了激活函数):

import matplotlib.pyplot as plt
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras.models import load_model
from loaders.nslKddCup99.nslKddCup99Loader import NslKddCup99

def buildMyNetwork(inputs, bottleNeck):
    inputLayer = Input(shape=(inputs,))
    autoencoder = Dense(inputs*2, activation='tanh')(inputLayer)
    autoencoder = Dense(inputs*2, activation='tanh')(autoencoder)
    autoencoder = Dense(bottleNeck, activation='tanh')(autoencoder)
    autoencoder = Dense(inputs*2, activation='tanh')(autoencoder)
    autoencoder = Dense(inputs*2, activation='tanh')(autoencoder)
    autoencoder = Dense(inputs, activation='tanh')(autoencoder)
    autoencoder = Model(input=inputLayer, output=autoencoder)
    autoencoder.compile(optimizer='adadelta', loss='mean_squared_error')
    return autoencoder


dataSize = 1000
variables = 6
data = np.zeros((dataSize,variables))
data[:, 0] = np.random.uniform(0, 0.5, size=dataSize)
data[:, 1] = np.random.uniform(0, 0.5, size=dataSize)
data[:, 2] = data[:, 0] + data[:, 1]
data[:, 3] = data[:, 0] * data[:, 1]
data[:, 4] = data[:, 0] / data[:, 1]
data[:, 5] = data[:, 0] ** data[:, 1]

trainData, testData = data[:900], data[900:]

model = buildMyNetwork(variables,2)
model.fit(trainData, trainData, nb_epoch=2000)
predictions = model.predict(testData)

for x in range(variables):
    plt.scatter(testData[:, x], predictions[:, x])
    plt.show()
    plt.close()

For this example I used TanH activation function, but I tried with others and worked aswell. 在此示例中,我使用了TanH激活功能,但我尝试了其他方法并同样工作。 The dataset has now 6 variables but the autoencoder has a bottleneck of 2 neurons; 数据集现在有6个变量,但是自动编码器有2个神经元的瓶颈; as long as variables 2 to 5 are formed combining variables 0 and 1, the autoencoder only needs to pass the information of those two and learn the functions to generate the other variables on the decoding phase. 只要结合变量0和1形成变量2到5,自动编码器仅需要传递这两个变量的信息并学习在解码阶段生成其他变量的函数。 The example above shows how all functions are learnt but one, the division... I don't know yet why. 上面的示例显示了如何学习所有功能,但其中一个是除法...我还不知道为什么。

I think that your case is relatively easy to explain why your network might fail to learn an identity function. 我认为您的案例相对容易解释,为什么您的网络可能无法学习身份功能。 Let's go through your example: 让我们看一下您的示例:

  1. Your input comes from 2d space - and it doesn't lie on a 1d or 0d submanifold - due to uniform distribiution. 由于统一的分配,您的输入来自2d空间-它不位于1d0d子流形上。 From this it's easy to see that in order to get an identity function from your autoencoder every layer should be able to represent a function which range is at least two dimensional , beacuse the output of your last layer should also lie on a 2d manifold. 由此很容易看出,为了从您的自动编码器中获取身份函数, 一层都应该能够表示一个范围至少为二维的函数,因为最后一层的输出也应该位于2d流形上。
  2. Let's go through your network and check if it satisfy the condtion need: 让我们遍历您的网络并检查其是否满足条件需求:

     inputLayer = Input(shape=(2,)) autoencoder = Dense(4, activation='relu')(inputLayer) autoencoder = Dense(4, activation='relu')(autoencoder) autoencoder = Dense(2, activation='relu')(autoencoder) # Possible problems here 

    You may see that the bottleneck might cause a problem - for this layer it might be hard to satisfy the condition from the first point. 您可能会看到瓶颈可能会导致问题-对于这一层,可能从一开始就很难满足条件。 For this layer - in order to get the 2-dimensional output range you need to have weights which will make all the examples not falling into saturation region of relu (in this case all this samples will be squashed to 0 in one of the units - what makes impossible for range to be "fully" 2d ). 对于此层-为了获得二维输出范围,您需要具有权重,该权重将使所有示例都不会落入relu饱和区域(在这种情况下,所有这些样本都将以一种单位压缩为0是什么使范围不可能“完全” 2d )。 So basically - the probability that this will not happen is relatively small. 所以基本上-不会发生这种情况的可能性相对较小。 Also the probability that backpropagation will not move this unit to this region also cannot be neglected. 同样,也不能忽略反向传播不会将该单元移动到该区域的可能性。

UPDATE: 更新:

In a comment the question was asked why optimizer fail to prevent or undo the saturation. 在评论中,有人提出了一个问题,为什么优化器无法阻止或消除饱和度。 It's an example of a one of the important relu downsides - once an example falls into a relu saturation region - this example doesn't directly take part in learning of a given unit. 这是重要的relu缺点之一的示例-一旦示例落入relu饱和区域-此示例将不直接参与给定单元的学习。 It could affect it by influencing previous units - but due to 0 derivative - this influence is not direct. 它可以通过影响先前的单位来影响它-但是由于0导数-这种影响不是直接的。 So basically unsaturating an example comes from a side effect - not the direct action of an optimizer. 因此,基本上不饱和的示例来自副作用-不是优化程序的直接作用。

This is a really cool usecase to see and study the difficulties of training a Neural Net. 这是一个非常酷的用例,可以看到并研究训练神经网络的困难。

Indeed I see multiple possibilities : 实际上,我看到了多种可能性:

1) If there are very different results between 2 different runs, it can come from the initialization. 1)如果两次运行之间有非常不同的结果,则可能来自初始化。 But it can also come from your data set which isn't the same at every run. 但是它也可能来自您的数据集,每次运行都不相同。

2) Something that will make it difficult for the network to learn the correlation is your activations, more precisely the sigmoid. 2)您的激活,更确切地说是S型,将使网络难以学习相关性。 I would change that non linearity to another 'relu'. 我会将该非线性更改为另一个“ relu”。 No reason to use a Sigmoid here, I know it looks linear around 0 but it isnt really linear. 没有理由在这里使用Sigmoid,我知道它看起来在0附近是线性的,但实际上并不是线性的。 To produce a 0.7, the raw output will have to be quite high. 要产生0.7,原始输出将必须很高。 The "easy" relation that you have in mind for this network isnt so easy since you constrain it. 由于要约束该网络,因此您想到的“简单”关系并不是那么容易。

3) if it is sometimes not good, maybe it needs more epochs to converge? 3)如果有时效果不好,也许需要更多的时期才能收敛?

4) maybe you'd need a bigger dataset? 4)也许您需要更大的数据集? In theory you're good since there is 900 samples for < 200 parameters in your network but who knows... 从理论上讲,您是很好的,因为您的网络中有900个样本,其中<200个参数,但谁知道...

I can't try all of this since I'm on my phone right now but feel free to try and troubleshout with the hints I gave you :) I hope this helps. 由于我现在正在通话中,所以无法尝试所有这些操作,但是请随时尝试尝试解决我给您的提示:)希望对您有所帮助。

EDIT : 编辑:

After some trials, as Marcin Możejko was saying, the issue comes from the activations. 经过一些试验,就像MarcinMożejko所说的那样,问题出在激活方面。 You can read his answer to have more info on what is going wrong. 您可以阅读他的答案以获取更多有关发生问题的信息。 The way to fix it is to change your activations. 解决它的方法是更改​​您的激活。 If you are a fan of 'relu' , you can use a special version of this activation. 如果您是'relu'的粉丝,则可以使用此激活的特殊版本。 For this, use a LeakyRelu layer and do not set an activation to the previous layer, like this : 为此,请使用LeakyRelu层,并且不要将激活设置为上一层,如下所示:

    autoencoder = Dense(inputs*2)(inputLayer)
    autoencoder = LeakyReLU(alpha=0.3)(autoencoder)

This will solve the case where you get stuck in a nonoptimal solution. 这将解决您陷入非最佳解决方案的情况。 Otherwise, as I said above, you can try not to use any non-linearities. 否则,如上所述,您可以尝试不使用任何非线性。 Your loss will go down way faster and doesn't get stuck. 您的损失将更快下降,并且不会陷入困境。

Keep in mind that the non linearities were introduced to get the networks to find more complex patterns in the data. 请记住,引入非线性是为了使网络能够在数据中找到更复杂的模式。 In your case you have the simplest linear pattern. 在您的情况下,您具有最简单的线性模式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM