简体   繁体   English

使用mxnet进行简单的梯度下降

[英]Simple gradient descent using mxnet

I'm trying to use MXNet's gradient descent optimizers to minimize a function. 我正在尝试使用MXNet的梯度下降优化器来最小化功能。 The equivalent example in Tensorflow would be: Tensorflow中的等效示例是:

import tensorflow as tf

x = tf.Variable(2, name='x', dtype=tf.float32)
log_x = tf.log(x)
log_x_squared = tf.square(log_x)

optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(log_x_squared)

init = tf.initialize_all_variables()

def optimize():
  with tf.Session() as session:
    session.run(init)
    print("starting at", "x:", session.run(x), "log(x)^2:", session.run(log_x_squared))
    for step in range(10):  
      session.run(train)
      print("step", step, "x:", session.run(x), "log(x)^2:", session.run(log_x_squared))

I am not sure how to accomplish the same in MXNet. 我不确定如何在MXNet中完成相同的工作。 The optimizer API documentation does not appear to have an equivalent method. 优化程序API 文档似乎没有等效方法。 Here's what I've tried so far. 这是我到目前为止所尝试的内容。 The main confusion has been around the need to pass training data: 主要的困惑是需要传递培训数据:

import mxnet as mx

x = mx.sym.Variable('data')
log_x = mx.sym.log(x)
log_x_squared = mx.sym.square(log_x)

mod = mx.mod.Module(log_x_squared)  # Create a module where the loss function
                                    # is the one we want to optimize
mod.bind(data_shapes=[('data', (1,1))])  # ?? not sure if this is correct - we
                                         # are saying our input is a scalar
mod.init_params()
mod.init_optimizer()  # SGD is default

mod.fit()  # ?? must pass data_iter to fit

It seems like the x variable should be somehow fed back in as the data_iter but I don't know how to accomplish this. 似乎x变量应该以某种方式作为data_iter反馈,但我不知道如何实现这一点。

Update: thanks to kevinthesun for their excellent answer! 更新:感谢kevinthesun的出色答案! Here is a working minimization routine built on top of a single hidden-layer neural net: 这是一个建立在单个隐藏层神经网络之上的工作最小化例程:

import mxnet as mx
import numpy as np


def minimize(objective_function,
             initial_params,
             max_iters=1000,
             optimizer='sgd',
             optimizer_params=(('learning_rate', 0.1),),
             tol=1e-8):

    class InitialParam(mx.init.Initializer):

        def __init__(self, vals):
            super(InitialParam, self).__init__()
            self._vals = vals

        def _init_weight(self, _, arr):
            arr[:] = self._vals.asnumpy()[:, np.newaxis]


    x = mx.sym.Variable('data')
    params_len = initial_params.shape[0]
    fc = mx.sym.FullyConnected(data=x, name='fc1',
                               num_hidden=params_len,
                               no_bias=True)

    # Passing the FullyConnected layer into the objective function
    # is difficult to manipulate. If the fully connected layer represents
    # [x, y] for optimizing a 2 dimensional function f(x, y) it is easier
    # to work with x, and y. So we split the fully connected layer into a
    # number of symbols for each parameter:
    param_syms = []
    for i in range(params_len):
        ps = mx.sym.slice(fc, begin=(0, i), end=(1, i + 1))
        param_syms.append(ps)

    # The loss function for the network is our objective function.
    loss = mx.sym.MakeLoss(objective_function(param_syms))
    mod = mx.mod.Module(loss)

    mod.bind(data_shapes=[('data', (1,))])
    mod.init_params(InitialParam(initial_params))
    mod.init_optimizer(optimizer=optimizer,
                       optimizer_params=optimizer_params)

    (o_name, o_shape), = mod.output_shapes

    i = 0
    params = initial_params
    old_val = np.full(o_shape, np.nan)
    while i < max_iters:
        mod.forward_backward(mx.io.DataBatch(
            data=[mx.nd.ones((1,))])) 
        mod.update()
        params = mod.get_params()[0]['fc1_weight']
        val = mod.get_outputs()[0].asnumpy()
        if np.allclose(old_val, val, atol=tol):
            print 'Function value: {}'.format(val)
            print 'Iterations: {}'.format(i)
            return params

        old_val = val
        i += 1

    return params

and using it: 并使用它:

def my_func(x):
    return (x[0] + 1) ** 2

p = minimize(my_func, mx.nd.array([1.0]))
p.asnumpy()

>>> array([[-0.99999988]], dtype=float32)

and another: 另一个:

def my_func(x):
    return (x[0] + 1) ** 2 + (x[1] - 2) ** 2 + (x[2] + 3) ** 2

p = minimize(my_func, mx.nd.array([1.0, 1.5, 2.0]))
p.asnumpy()

>>> array([[-0.99996436],
           [ 1.99999106],
           [-2.99991083]], dtype=float32)

Currently it is not as easy as tensorflow to optimize a simple function using MXNet, due to the lack of support in frontend. 目前,由于前端缺乏支持,使用MXNet优化简单功能并不像tensorflow那么容易。

First you need a loss Function as the last layer of your network. 首先,您需要一个损失功能作为网络的最后一层。 Here it is log_x_squared. 这是log_x_squared。 Use MakeLoss to create a loss function. 使用MakeLoss创建损失函数。

Second is the input and weights. 第二是输入和权重。 Since currently in MXNet Variable is not counted as trainable weight, you need to set x as weight. 由于目前MXNet中的变量不计入可训练的重量,因此需要将x设置为重量。 Here is a workaround: Set a 'fake' input variable which is always to be 1. After it add a fullyconnected layer with 1 hidden unit and no bias. 这是一个解决方法:设置一个'假的'输入变量,它总是为1.在它添加一个完全连接的图层后,它有一个隐藏单元,没有偏差。 This gives us "1 * x". 这给了我们“1 * x”。 Now our x is a weight. 现在我们的x是一个重量。

Third if you would like to optimize multiple times on single data sample, module.fit might not be the best choice. 第三,如果您想在单个数据样本上多次优化,module.fit可能不是最佳选择。 After initializing optimizer. 初始化优化器后。 You just need to call module.forward_backward() and module.update() multiple times. 您只需要多次调用module.forward_backward()和module.update()。 For forward_backward function you need to pass a databatch, which is a simpler interface comparing to dataiter. 对于forward_backward函数,您需要传递数据集,与数据服务器相比,这是一个更简单的接口。 Here we just need to pass a constant ndarray of 1 every time. 这里我们只需要每次都传递一个1的常量ndarray。

Actually we construct a computation graph of log(1 * x) ^ 2 and x becomes a weight instead of variable. 实际上我们构造了log(1 * x)^ 2的计算图,x变成了重量而不是变量。

Anyway, we should consider providing a similar interface of tensorflow to optimize variable. 无论如何,我们应该考虑提供类似的tensorflow接口来优化变量。

Hope this is useful info! 希望这是有用的信息!

mxnet is different from tensorflow. mxnet与tensorflow不同。 You need to read this tutorial 您需要阅读本教程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM