张量流中的稀疏自编码器代价函数

Question

I've been going through a variety of TensorFlow tutorials to try to familiarize myself with how it works; 我一直在阅读各种TensorFlow教程，试图熟悉它的工作原理; and I've become interested in utilizing autoencoders. 我对使用自动编码器感兴趣。

I started by using the model autoencoder in Tensorflow's models repository: 我开始在Tensorflow的模型库中使用模型autoencoder：

https://github.com/tensorflow/models/tree/master/autoencoder https://github.com/tensorflow/models/tree/master/autoencoder

I got it working, and while visualizing the weights, expected to see something like this: 我得到它的工作，并在可视化权重，期望看到这样的事情：

however, my autoencoder gives me garbage-looking weights (despite accurately recreating the input image). 但是，我的自动编码器给了我看起来很垃圾的权重（尽管准确地重新创建了输入图像）。

Further reading suggests that what I'm missing is that my autoencoder is not sparse, so I need to enforce a sparsity cost to the weights. 进一步阅读表明我缺少的是我的自动编码器不稀疏，所以我需要对权重实施稀疏成本。

I've tried to add a sparsity cost to the original code (based off of this example 3 ), but it doesn't seem to change the weights to looking like the model ones. 我试图在原始代码中添加稀疏成本（基于此示例3 ），但它似乎没有将权重更改为看起来像模型。

How can I properly change the cost to get features that look like the ones that a typically found in the autoencoded MNIST dataset? 如何正确地更改成本以获得看起来像自动编码的MNIST数据集中常见的功能？ My modified model is here: 我修改过的模型在这里：

import numpy as np
import random
import math
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import matplotlib.pyplot as plt

def xavier_init(fan_in, fan_out, constant = 1):
    low = -constant * np.sqrt(6.0 / (fan_in + fan_out))
    high = constant * np.sqrt(6.0 / (fan_in + fan_out))
    return tf.random_uniform((fan_in, fan_out), minval = low, maxval = high, dtype = tf.float32)

class AdditiveGaussianNoiseAutoencoder(object):
    def __init__(self, n_input, n_hidden, transfer_function = tf.nn.sigmoid, optimizer = tf.train.AdamOptimizer(),
                 scale = 0.1):
        self.n_input = n_input
        self.n_hidden = n_hidden
        self.transfer = transfer_function
        self.scale = tf.placeholder(tf.float32)
        self.training_scale = scale
        network_weights = self._initialize_weights()
        self.weights = network_weights
        self.sparsity_level= 0.1#np.repeat([0.05], self.n_hidden).astype(np.float32)
        self.sparse_reg = 10

        # model
        self.x = tf.placeholder(tf.float32, [None, self.n_input])
        self.hidden = self.transfer(tf.add(tf.matmul(self.x + scale * tf.random_normal((n_input,)),
                self.weights['w1']),
                self.weights['b1']))
        self.reconstruction = tf.add(tf.matmul(self.hidden, self.weights['w2']), self.weights['b2'])

        # cost
        self.cost = 0.5 * tf.reduce_sum(tf.pow(tf.subtract(self.reconstruction, self.x), 2.0)) + self.sparse_reg \
                        * self.kl_divergence(self.sparsity_level, self.hidden)

        self.optimizer = optimizer.minimize(self.cost)

        init = tf.global_variables_initializer()
        self.sess = tf.Session()
        self.sess.run(init)

    def _initialize_weights(self):
        all_weights = dict()
        all_weights['w1'] = tf.Variable(xavier_init(self.n_input, self.n_hidden))
        all_weights['b1'] = tf.Variable(tf.zeros([self.n_hidden], dtype = tf.float32))
        all_weights['w2'] = tf.Variable(tf.zeros([self.n_hidden, self.n_input], dtype = tf.float32))
        all_weights['b2'] = tf.Variable(tf.zeros([self.n_input], dtype = tf.float32))
        return all_weights

    def partial_fit(self, X):
        cost, opt = self.sess.run((self.cost, self.optimizer), feed_dict = {self.x: X,
                                                                            self.scale: self.training_scale
                                                                            })
        return cost

    def kl_divergence(self, p, p_hat):
        return tf.reduce_mean(p * tf.log(p) - p * tf.log(p_hat) + (1 - p) * tf.log(1 - p) - (1 - p) * tf.log(1 - p_hat))

    def calc_total_cost(self, X):
        return self.sess.run(self.cost, feed_dict = {self.x: X,
                                                     self.scale: self.training_scale
                                                     })

    def transform(self, X):
        return self.sess.run(self.hidden, feed_dict = {self.x: X,
                                                       self.scale: self.training_scale
                                                       })

    def generate(self, hidden = None):
        if hidden is None:
            hidden = np.random.normal(size = self.weights["b1"])
        return self.sess.run(self.reconstruction, feed_dict = {self.hidden: hidden})

    def reconstruct(self, X):
        return self.sess.run(self.reconstruction, feed_dict = {self.x: X,
                                                               self.scale: self.training_scale
                                                               })

    def getWeights(self):
        return self.sess.run(self.weights['w1'])

    def getBiases(self):
        return self.sess.run(self.weights['b1'])


mnist = input_data.read_data_sets('MNIST_data', one_hot = True)

def get_random_block_from_data(data, batch_size):
    start_index = np.random.randint(0, len(data) - batch_size)
    return data[start_index:(start_index + batch_size)]

X_train = mnist.train.images
X_test = mnist.test.images

n_samples = int(mnist.train.num_examples)
training_epochs = 50
batch_size = 128
display_step = 1

autoencoder = AdditiveGaussianNoiseAutoencoder(n_input = 784,
                                               n_hidden = 200,
                                               transfer_function = tf.nn.sigmoid,
                                               optimizer = tf.train.GradientDescentOptimizer(learning_rate = 0.01),
                                               scale = 0.01)

for epoch in range(training_epochs):
    avg_cost = 0.
    total_batch = int(n_samples / batch_size)
    # Loop over all batches
    for i in range(total_batch):
        batch_xs = get_random_block_from_data(X_train, batch_size)

        # Fit training using batch data
        cost = autoencoder.partial_fit(batch_xs)
        # Compute average loss
        avg_cost += cost / n_samples * batch_size

    # Display logs per epoch step
    if epoch % display_step == 0:
        print("Epoch:", '%04d' % (epoch + 1), "cost=", avg_cost)

print("Total cost: " + str(autoencoder.calc_total_cost(X_test)))

imageToUse = random.choice(mnist.test.images)

plt.imshow(np.reshape(imageToUse,[28,28]), interpolation="nearest", cmap="gray", clim=(0, 1.0))
plt.show()

# input weights
wts = autoencoder.getWeights()
dim = math.ceil(math.sqrt(autoencoder.n_hidden))
plt.figure(1, figsize=(dim, dim))
for i in range(0,autoencoder.n_hidden):
    im = wts.flatten()[i::autoencoder.n_hidden].reshape((28,28))
    plt.subplot(dim, dim, i+1)
    #plt.title('Feature Weights ' + str(i))
    plt.imshow(im, cmap="gray", clim=(-1.0, 1.0))
    plt.colorbar()
plt.show()

predicted_imgs = autoencoder.reconstruct(X_test[:100])

# plot the reconstructed images
plt.figure(1, figsize=(10, 10))
plt.title('Autoencoded Images')
for i in range(0,100):
    im = predicted_imgs[i].reshape((28,28))
    plt.subplot(10, 10, i+1)
    plt.imshow(im, cmap="gray", clim=(0.0, 1.0))
plt.show()

Answer 1

I don't know that this will work for you, but I have seen it promote some sparsity in my own networks. 我不知道这对你有用，但我看到它在我自己的网络中促进了一些稀疏性。 I would recommend modifying your loss to use a combination of softmax cross entropy (or KL divergence if you wish) and l2 regularization loss on the weights. 我建议修改你的损失，使用softmax交叉熵（如果你愿意，可以使用KL分歧）和权重上的l2正则化损失。 I calculate the l2 loss with: 我计算了l2损失：

l2 = sum(tf.nn.l2_loss(var) for var in tf.trainable_variables() if not 'biases' in var.name)

This makes me regularize only on the weights, not biases, assuming that you have "biases" in the name of your bias tensors (lots of the tf.contrib.rnn library names bias tensors so that this works). 这使得我仅仅根据权重而非偏差进行规范化，假设您在偏差张量的名称中存在“偏差”（许多tf.contrib.rnn库名称偏差张量使得这有效）。 The overall cost function I use is then: 我使用的总体成本函数是：

cost = tf.nn.softmax_or_kl_divergence_or_whatever(labels=labels, logits=logits)
cost = tf.reduce_mean(cost)
cost = cost + beta * l2

where beta is a hyperparameter of the network that I then vary when exploring my hyperparameter space. beta是网络的超参数，然后我在探索超参数空间时会发生变化。

Another option, very similar to this, is to use l1 regularization instead. 与此非常相似的另一种选择是使用l1正则化。 This is supposed to promote sparsity more than l2 regularization . 这被认为可以促进超过l2正规化的稀疏性。 In my own examples I was not explicitly trying to promote sparsity, but saw it as a consequence of l2 regularization, but maybe l1 will give you more luck. 在我自己的例子中，我没有明确地试图促进稀疏性，但看到它是l2正规化的结果，但也许l1会给你更多的运气。 You can implement l1 regularization with something like: 您可以使用以下内容实现l1正则化：

l1 = sum(tf.reduce_sum(tf.abs(var)) for var in tf.trainable_variables() if not 'biases' in var.name)

followed by the cost definition above, substituting l1 for l2 . 然后是上面的成本定义，用l1代替l2 。

张量流中的稀疏自编码器代价函数

问题描述

1 个解决方案

解决方案1
1 2017-06-09 00:06:24

张量流中的稀疏自编码器代价函数

问题描述

1 个解决方案

解决方案1 1 2017-06-09 00:06:24

解决方案1
1 2017-06-09 00:06:24