简体繁体 English

如何训练和微调完全无监督的深度神经网络？

[英]How to train and fine-tune fully unsupervised deep neural networks?

原文 2016-01-29 13:41:52 4 1 machine-learning/ neural-network/ deep-learning/ unsupervised-learning/ autoencoder

In scenario 1, I had a multi-layer sparse autoencoder that tries to reproduce my input, so all my layers are trained together with random-initiated weights. 在方案1中，我有一个多层稀疏自动编码器，尝试重现我的输入，因此我所有的层都与随机启动的权重一起训练。 Without a supervised layer, on my data this didn't learn any relevant information (the code works fine, verified as I've already used it in many other deep neural network problems) 没有监督层，在我的数据上就没有学习任何相关信息（代码运行正常，经过验证，因为我已经在许多其他深层神经网络问题中使用过它）

In scenario 2, I simply train multiple auto-encoders in a greedy layer-wise training similar to that of deep learning (but without a supervised step in the end), each layer on the output of the hidden layer of the previous autoencoder. 在场景2中，我只是通过类似于深度学习的贪婪分层训练来训练多个自动编码器（但最终没有监督步骤），每个层都在前一个自动编码器的隐藏层的输出上。 They'll now learn some patterns (as I see from the visualized weights) separately, but not awesome, as I'd expect it from single layer AEs. 他们现在将分别学习一些模式（如我从可视化的权重中看到的），但并不像我从单层AE所期望的那样出色。

So I've decided to try if now the pretrained layers connected into 1 multi-layer AE could perform better than the random-initialized version. 因此，我决定尝试现在连接到1个多层AE中的预训练层是否可以比随机初始化的版本更好。 As you see this is same as the idea of the fine-tuning step in deep neural networks. 如您所见，这与深度神经网络中的微调步骤的想法相同。

But during my fine-tuning, instead of improvement, the neurons of all the layers seem to quickly converge towards an all-the-same pattern and end up learning nothing. 但是，在我进行微调而不是改善的过程中，所有层的神经元似乎都迅速收敛到一个完全相同的模式，最终无所作为。

Question: What's the best configuration to train a fully unsupervised multi-layer reconstructive neural network? 问题：训练完全无监督的多层重构神经网络的最佳配置是什么？ Layer-wise first and then some sort of fine tuning? 首先进行分层，然后进行某种微调？ Why is my configuration not working? 为什么我的配置不起作用？

1 个解决方案

After some tests I've came up with a method that seems to give very good results, and as you'd expect from a 'fine-tuning' it improves the performance of all the layers: 经过一些测试，我想出了一种方法，该方法似乎可以提供很好的结果，并且正如您期望的那样，通过“微调”可以改善所有层的性能：

Just like normally, during the greedy layer-wise learning phase, each new autoencoder tries to reconstruct the activations of the previous autoencoder's hidden layer. 像通常一样，在贪婪的逐层学习阶段，每个新的自动编码器都尝试重建先前自动编码器隐藏层的激活。 However, the last autoencoder (that will be the last layer of our multi-layer autoencoder during fine-tuning) is different, this one will use the activations of the previous layer and tries to reconstruct the 'global' input (ie the original input that was fed to the first layer). 但是，最后一个自动编码器（在微调过程中将是多层自动编码器的最后一层）是不同的，这将使用前一层的激活并尝试重建“全局”输入（即原始输入）送入第一层）。

This way when I connect all the layers and train them together, the multi-layer autoencoder will really reconstruct the original image in the final output. 这样，当我连接所有层并将它们训练在一起时，多层自动编码器将真正在最终输出中重建原始图像。 I found a huge improvement in the features learned, even without a supervised step. 我发现，即使没有监督步骤，所学功能也有了巨大的改进。

I don't know if this is supposed to somehow correspond with standard implementations but I haven't found this trick anywhere before. 我不知道这是否应该与标准实现相对应，但是我以前在任何地方都没有找到这个技巧。