简体   繁体   English

MLP在回归的张量流中......没有收敛

[英]MLP in tensorflow for regression… not converging

Hello it is my first time working with tensorflow, i try to adapt the example here TensorFlow-Examples to use this code for regression problems with boston database. 您好,这是我第一次使用tensorflow,我尝试在此处调整示例TensorFlow-Examples使用此代码来解决波士顿数据库的回归问题。 Basically, i only change the cost function ,the database, the inputs number, and the target number but when i run the MPL doesn't converge (i use a very low rate). 基本上,我只更改成本函数,数据库,输入数和目标数,但是当我运行MPL时不会收敛(我使用非常低的费率)。 I test it with Adam Optimization and descend gradient optimization but i have the same behavior. 我用Adam Optimization测试它并降低梯度优化,但我有相同的行为。 I appreciate your suggestions and ideas...!!! 我感谢你的建议和想法...... !!!

Observation: When i ran this program without the modifications described above, the cost function value always decrease. 观察:当我在没有上述修改的情况下运行该程序时,成本函数值总是减小。

Here the evolution when i run the model, the cost function oscillated even with a very low learning rate.In the worst case, i hope the model converge in a value, for example the epoch 944 shows a value 0.2267548 if not other better value is find then this value must stay until the optimization is finished. 这是我运行模型时的演变,成本函数即使在非常低的学习率下也会振荡。在最坏的情况下,我希望模型收敛于某个值,例如,纪元944显示的值为0.2267548,如果没有其他更好的值则是然后找到该值必须保持到优化完成。

Epoch: 0942 cost= 0.445707272 时代:0942成本= 0.445707272

Epoch: 0943 cost= 0.389314095 时代:0943成本= 0.389314095

Epoch: 0944 cost= 0.226754842 时代:0944成本= 0.226754842

Epoch: 0945 cost= 0.404150135 时代:0945成本= 0.404150135

Epoch: 0946 cost= 0.382190095 时代:0946成本= 0.382190095

Epoch: 0947 cost= 0.897880572 时代:0947成本= 0.897880572

Epoch: 0948 cost= 0.481954243 时代:0948成本= 0.481954243

Epoch: 0949 cost= 0.269408980 时代:0949成本= 0.269408980

Epoch: 0950 cost= 0.427961614 时代:0950成本= 0.427961614

Epoch: 0951 cost= 1.206053280 时代:0951成本= 1.206053280

Epoch: 0952 cost= 0.834200084 Epoch:0952 cost = 0.834200084

from __future__ import print_function

# Import MNIST data
#from tensorflow.examples.tutorials.mnist import input_data
#mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

import tensorflow as tf
import  ToolInputData as input_data

ALL_DATA_FILE_NAME = "boston_normalized.csv"



##Load complete database, then this database is splitted in training,   validation and test set
completedDatabase = input_data.Databases(databaseFileName=ALL_DATA_FILE_NAME,     targetLabel="MEDV", trainPercentage=0.70, valPercentage=0.20, testPercentage=0.10,
                  randomState=42, inputdataShuffle=True, batchDataShuffle=True)


# Parameters
learning_rate = 0.0001
training_epochs = 1000
batch_size = 5
display_step = 1

# Network Parameters
n_hidden_1 = 10 # 1st layer number of neurons
n_hidden_2 = 10 # 2nd layer number of neurons

n_input = 13 # number of features of my database
n_classes = 1 # one target value (float)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])


# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)
    # Output layer with linear activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
   return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.square(pred-y))
#cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer =  tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.initialize_all_variables()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(completedDatabase.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = completedDatabase.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
                                                      y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Optimization Finished!")    

A couple of points. 几点。

Your model is quite shallow being only two layers. 你的模型很浅,只有两层。 Granted you'll need more data to train a larger model so I don't know how much data you have in the Boston data set. 当然,您需要更多数据来训练更大的模型,因此我不知道您在波士顿数据集中有多少数据。

What are your labels? 你的标签是什么? That would better inform whether squared error is better for your model. 这样可以更好地告知您的模型的平方误差是否更好。

Also your learning rate is quite low. 你的学习率也很低。

You stated that your labels are in the range [0,1], but I cannot see that the predictions are in the same range. 您声明您的标签在[0,1]范围内,但我看不出预测在同一范围内。 In order to make them comparable to the labels, you should transform them to the same range before returning, for example using the sigmoid function: 为了使它们与标签相当,您应该在返回之前将它们转换为相同的范围,例如使用sigmoid函数:

out_layer = tf.matmul(...)
out = tf.sigmoid(out_layer)
return out

Maybe this fixes the problem with the stability. 也许这解决了稳定性的问题。 You might also want to increase the batch size a bit, for example 20 examples per batch. 您可能还想稍微增加批量大小,例如每批20个示例。 If this improves the performance, you can probably increase the learning rate a bit. 如果这样可以提高性能,您可以稍微提高学习速度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM