简体   繁体   English

堆叠RBM以在sklearn中创建深层信念网络

[英]Stacking RBMs to create Deep belief network in sklearn

According to this website , deep belief network is just stacking multiple RBMs together, using the output of previous RBM as the input of next RBM. 根据该网站 ,深度信念网络只是将多个RBM堆叠在一起,使用先前RBM的输出作为下一个RBM的输入。 在此输入图像描述

In the scikit-learn documentation , there is one example of using RBM to classify MNIST dataset. 在scikit-learn 文档中 ,有一个使用RBM对MNIST数据集进行分类的示例。 They put a RBM and a LogisticRegression in a pipeline to achieve better accuracy. 他们将RBMLogisticRegression放在管道中以实现更高的准确性。

Therefore I wonder if I can add multiple RBM into that pipeline to create a Deep Belief Networks as shown in the following code. 因此,我想知道是否可以将多个RBM添加到该管道中以创建深度信任网络,如以下代码所示。

from sklearn.neural_network import BernoulliRBM
import numpy as np
from sklearn import linear_model, datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

digits = datasets.load_digits()
X = np.asarray(digits.data, 'float32')
Y = digits.target
X = (X - np.min(X, 0)) / (np.max(X, 0) + 0.0001)  # 0-1 scaling

X_train, X_test, Y_train, Y_test = train_test_split(X, Y,
                                                    test_size=0.2,
                                                    random_state=0)

logistic = linear_model.LogisticRegression(C=100)
rbm1 = BernoulliRBM(n_components=100, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
rbm2 = BernoulliRBM(n_components=80, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
rbm3 = BernoulliRBM(n_components=60, learning_rate=0.06, n_iter=100, verbose=1, random_state=101)
DBN3 = Pipeline(steps=[('rbm1', rbm1),('rbm2', rbm2), ('rbm3', rbm3), ('logistic', logistic)])

DBN3.fit(X_train, Y_train)

print("Logistic regression using RBM features:\n%s\n" % (
    metrics.classification_report(
        Y_test,
        DBN3.predict(X_test))))

However, I discover that the more RBM I add to the pipeline, the less the accuracy is. 但是,我发现我添加到管道的RBM越多,准确性就越低。

1 RBM in pipeline --> 95% 管道中的1 RBM - > 95%

2 RBMs in pipeline --> 93% 2管道中的RBM - > 93%

3 RBMs in pipeline --> 89% 管道中的3个RBM - > 89%

The training curve below shows that 100 iterations is just right for convergent. 下面的训练曲线表明100次迭代恰好适合收敛。 More iterations will cause over-fitting and the likelihood will go down again. 更多迭代将导致过度拟合,并且可能性将再次下降。

Batch size = 10 批量大小= 10

在此输入图像描述

Batch size = 256 or above 批量大小= 256或以上

I have noticed one interesting thing. 我注意到一件有趣的事情。 If I use a higher batch size, the performance of the network deteriorates a lot. 如果我使用更高的批量大小,网络的性能会恶化很多。 When the batch size is above 256, the accuracy drops to only less than 10%. 当批量大小超过256时,精度降至仅低于10%。 The training curve somehow doesn't make sense to me, with first and second RBMs don't learn much, but the third RBM suddenly learns quickly. 训练曲线在某种程度上对我来说没有意义,第一次和第二次RBM没有学到太多东西,但第三次RBM突然学得很快。 在此输入图像描述

It looks like 89% is somehow the bottleneck for a network with 3 RBMs. 看起来89%在某种程度上是3 RBM网络的瓶颈。

I wonder if I am doing anything wrong here. 我想知道我在这里做错了什么。 Is my understanding of deep belief network correct? 我对深层信仰网络的理解是否正确?

The following is not quite a definitive answer as it lacks any statistical rigor. 以下不是一个明确的答案,因为它没有任何统计严谨性。 However, the necessary parameter optimization and evaluation will still take several days of CPU time. 但是,必要的参数优化和评估仍需要几天的CPU时间。 Until then I submit the following proof of principle as an answer. 在此之前,我提交以下原则证明作为答案。

Tl;dr 文艺青年最爱的

Larger layers + much longer training => performance of logistic regression by itself < + 1 RBM layer < + RBM stack / DBN 更大的层+更长的训练=>逻辑回归本身的性能<+ 1 RBM层<+ RBM堆栈/ DBN

Introduction 介绍

As I have stated in one of my comments to OP's post, the use of stacked RBMs / DBNs for unsupervised pre-training has been systematically explored in Erhan et al. 正如我在OP的帖子中所说的那样,在Erhan等人的系统研究中,系统地探讨了堆叠式RBM / DBN用于无人监督的预训练 (2010) . (2010) To be precise, their setup differs from OP's setup in so far as after training the DBN, they add a final layer of output neurons and fine-tune the complete network using backprop. 确切地说,他们的设置与OP的设置不同,因为在训练DBN之后,他们添加了最后一层输出神经元并使用backprop微调整个网络。 OP evaluates the benefit of adding one or more RBM layers using the performance of logistic regression on the output of the final layer. OP使用逻辑回归的性能评估在最终层的输出上添加一个或多个RBM层的好处。 Furthermore, Erhan et al. 此外,Erhan等人。 also don't use the 64 pixel digits data set in scikit-learn but the 784 pixel MNIST images (and variants thereof). 也不要使用scikit-learn中的64像素digits数据集,而是使用784像素MNIST图像(及其变体)。

That being said, the similarities are substantial enough to take their findings as the starting point for the evaluation of a scikit-learn implementation of a DBN, which is precisely what I have done: I also use the MNIST data set, and I use the optimal parameters (where reported) by Erhan et al. 话虽这么说,相似之处足以将他们的发现作为评估DBN的scikit-learn实现的起点,这正是我所做的:我也使用MNIST数据集,我使用Erhan等人的最佳参数(报告的地方)。 These parameters differ substantially from the ones given in the example by OP and are likely the source of the poor performance of OP's model: in particular, the layer sizes are much larger and the number of training samples is orders of magnitudes more. 这些参数与OP中的示例中给出的参数大不相同,并且可能是OP模型的不良性能的来源:特别地,层大小更大并且训练样本的数量是更大的数量级。 However, as OP, I use logistic regression in the final step of the pipeline to evaluate if the image transformations by an RBM or by a stack of RBMs/a DBN improve classification. 但是,作为OP,我在管道的最后一步中使用逻辑回归来评估RBM或RBM / DBN堆栈的图像转换是否改进了分类。

Incidentally, having (roughly) as many units in the RBM layers (800 units) as in the original image (784 pixels), also makes pure logistic regression on the raw image pixels a suitable benchmark model. 顺便提及,在原始图像(784像素)中具有(大致)RBM层(800单位)中的单位,也使原始图像像素上的纯逻辑回归成为合适的基准模型。

I hence compare the following 3 models: 因此,我比较了以下3个模型:

  1. logistic regression by itself (ie the baseline / benchmark model), 逻辑回归本身(即基线/基准模型),

  2. logistic regression on outputs of an RBM, and 关于RBM输出的逻辑回归,和

  3. logistic regression on outputs of a stacks of RBMs / a DBN. 对RBM / DBN堆栈的输出进行逻辑回归。

Results 结果

Consistent with the previous literature, my preliminary results indeed indicate that using the output of an RBM for logistic regression improves the performance compared to just using the raw pixel values by itself, and the DBN transformation yet improves on the RBM although the improvement is smaller. 与之前的文献一致,我的初步结果确实表明,与仅使用原始像素值相比,使用RBM的输出进行逻辑回归可以改善性能,并且DBN转换在RBM上有所改进,尽管改进较小。

Logistic regression by itself: Logistic回归本身:

Model performance:
             precision    recall  f1-score   support

        0.0       0.95      0.97      0.96       995
        1.0       0.96      0.98      0.97      1121
        2.0       0.91      0.90      0.90      1015
        3.0       0.90      0.89      0.89      1033
        4.0       0.93      0.92      0.92       976
        5.0       0.90      0.88      0.89       884
        6.0       0.94      0.94      0.94       999
        7.0       0.92      0.93      0.93      1034
        8.0       0.89      0.87      0.88       923
        9.0       0.89      0.90      0.89      1020

avg / total       0.92      0.92      0.92     10000

Logistic regression on outputs of an RBM: 对RBM产出的逻辑回归:

Model performance:
             precision    recall  f1-score   support

        0.0       0.98      0.98      0.98       995
        1.0       0.98      0.99      0.99      1121
        2.0       0.95      0.97      0.96      1015
        3.0       0.97      0.96      0.96      1033
        4.0       0.98      0.97      0.97       976
        5.0       0.97      0.96      0.96       884
        6.0       0.98      0.98      0.98       999
        7.0       0.96      0.97      0.97      1034
        8.0       0.96      0.94      0.95       923
        9.0       0.96      0.96      0.96      1020

avg / total       0.97      0.97      0.97     10000

Logistic regression on outputs of a stacks of RBMs / a DBN: 对RBM / DBN堆栈的输出进行逻辑回归:

Model performance:
             precision    recall  f1-score   support

        0.0       0.99      0.99      0.99       995
        1.0       0.99      0.99      0.99      1121
        2.0       0.97      0.98      0.98      1015
        3.0       0.98      0.97      0.97      1033
        4.0       0.98      0.97      0.98       976
        5.0       0.96      0.97      0.97       884
        6.0       0.99      0.98      0.98       999
        7.0       0.98      0.98      0.98      1034
        8.0       0.98      0.97      0.97       923
        9.0       0.96      0.97      0.96      1020

avg / total       0.98      0.98      0.98     10000

Code

#!/usr/bin/env python

"""
Using MNIST, compare classification performance of:
1) logistic regression by itself,
2) logistic regression on outputs of an RBM, and
3) logistic regression on outputs of a stacks of RBMs / a DBN.
"""

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import BernoulliRBM
from sklearn.base import clone
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report


def norm(arr):
    arr = arr.astype(np.float)
    arr -= arr.min()
    arr /= arr.max()
    return arr


if __name__ == '__main__':

    # load MNIST data set
    mnist = fetch_mldata('MNIST original')
    X, Y = mnist.data, mnist.target

    # normalize inputs to 0-1 range
    X = norm(X)

    # split into train, validation, and test data sets
    X_train, X_test, Y_train, Y_test = train_test_split(X,       Y,       test_size=10000, random_state=0)
    X_train, X_val,  Y_train, Y_val  = train_test_split(X_train, Y_train, test_size=10000, random_state=0)

    # --------------------------------------------------------------------------------
    # set hyperparameters

    learning_rate = 0.02 # from Erhan et el. (2010): median value in grid-search
    total_units   =  800 # from Erhan et el. (2010): optimal for MNIST / only slightly worse than 1200 units when using InfiniteMNIST
    total_epochs  =   50 # from Erhan et el. (2010): optimal for MNIST
    batch_size    =  128 # seems like a representative sample; backprop literature often uses 256 or 512 samples

    C = 100. # optimum for benchmark model according to sklearn docs: https://scikit-learn.org/stable/auto_examples/neural_networks/plot_rbm_logistic_classification.html#sphx-glr-auto-examples-neural-networks-plot-rbm-logistic-classification-py)

    # TODO optimize using grid search, etc

    # --------------------------------------------------------------------------------
    # construct models

    # RBM
    rbm = BernoulliRBM(n_components=total_units, learning_rate=learning_rate, batch_size=batch_size, n_iter=total_epochs, verbose=1)

    # "output layer"
    logistic = LogisticRegression(C=C, solver='lbfgs', multi_class='multinomial', max_iter=200, verbose=1)

    models = []
    models.append(Pipeline(steps=[('logistic', clone(logistic))]))                                              # base model / benchmark
    models.append(Pipeline(steps=[('rbm1', clone(rbm)), ('logistic', clone(logistic))]))                        # single RBM
    models.append(Pipeline(steps=[('rbm1', clone(rbm)), ('rbm2', clone(rbm)), ('logistic', clone(logistic))]))  # RBM stack / DBN

    # --------------------------------------------------------------------------------
    # train and evaluate models

    for model in models:
        # train
        model.fit(X_train, Y_train)

        # evaluate using validation set
        print("Model performance:\n%s\n" % (
            classification_report(Y_val, model.predict(X_val))))

    # TODO: after parameter optimization, evaluate on test set

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM