简体   繁体   English

Tensorflow线性回归未收敛至正确成本

[英]Tensorflow Linear Regression not converging to correct cost

I am trying to implement multivariate linear regression in tensorflow (using the Boston Housing Dataset), but it seems like my cost function is converging and the wrong value (24000 in my case). 我正在尝试在张量流中实现多元线性回归(使用Boston Housing Dataset),但似乎我的成本函数正在收敛并且值不正确(在我的情况下为24000)。 I tried scaling the features, but it still hasn't worked. 我尝试扩展功能,但仍然无法正常工作。 Any ideas as to what I'm doing wrong? 关于我在做什么错的任何想法吗? Here is the code: 这是代码:

from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
from sklearn.preprocessing import MinMaxScaler

rate = 0.000000011
epochs = 100
errors = []

def load_data():
    boston = load_boston()

    bos = pd.DataFrame(boston.data)

    output = pd.DataFrame(boston.target)

    return [bos, output]

xS, yS = load_data()

m = len(yS)

x_train, x_test, y_train, y_test = train_test_split(xS, yS, test_size=0.2)

scaler = MinMaxScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

theta = tf.Variable(tf.zeros([len(xS.columns), 1]))
X = tf.placeholder(tf.float32, shape=[m, len(xS.columns)])
y = tf.placeholder(tf.float32, shape=[m, 1])
b = tf.Variable(tf.zeros([m, 1]))

model = tf.matmul(tf.transpose(theta), tf.transpose(X)) + b

cost = tf.reduce_sum(tf.square(y-model))/(2*m)

optimizer = tf.train.GradientDescentOptimizer(rate).minimize(cost)

init = [tf.global_variables_initializer(), tf.local_variables_initializer()]

with tf.Session() as sess:
    sess.run(init)
    for e in range(epochs):
        sess.run(optimizer, feed_dict={X:xS, y:yS})
        loss = sess.run(cost, feed_dict={X:xS, y:yS})
        print("cost at step", e, loss)
        errors.append(loss)
        if errors[len(errors)-1] > errors[len(errors)-2]:
            break

    theta_temp = np.array(sess.run(theta))
    b_temp = np.array(sess.run(b))

plt.plot(list(range(len(errors))), errors)
plt.show()
h = np.transpose(np.add(np.matmul(np.transpose(theta_temp), np.transpose(xS)), np.transpose(b_temp)))
print(r2_score(h, yS))

You are doing most of the thing correctly. 您正在正确地执行大多数操作。 I will suggest the following changes in your code. 我将建议您对代码进行以下更改。

model = tf.matmul(X, theta) + b

Try this with a learning rate 0.001 and epoch 1000 and please report the result. 以0.001和epoch 1000的学习率进行尝试,请报告结果。

In your case where you are doing 在你做的情况下

model = tf.matmul(tf.transpose(theta), tf.transpose(X)) + b

you are making a mistake. 你在弄错。 First part of the right side is of size (1, m) and the second part is of size (m, 1). 右侧的第一部分的大小为(1,m),第二部分的大小为(m,1)。 Then you are getting a some result because of broadcasting that you are not expecting. 然后,由于不期望的广播,您将获得一些结果。 That is why you are seeing very bad results with a learning rate of 0.01 or 0.1. 这就是为什么您看到学习率仅为0.01或0.1的非常差的结果的原因。

My second suggestion is remove the break criterion. 我的第二个建议是删除中断标准。

if errors[len(errors)-1] > errors[len(errors)-2]: break

Stochastic gradients are noisy. 随机梯度很嘈杂。 There is no proof that you will always reduce the cost if you go into the less gradient direction (may be it is true for this convex problem but I have to think). 没有证据表明,如果您沿较小的梯度方向前进,那么总会降低成本(也许对这个凸问题确实如此,但我必须思考)。

from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
from sklearn.preprocessing import MinMaxScaler

rate = 0.1
epochs = 100
errors = []

def load_data():
    boston = load_boston()

    bos = pd.DataFrame(boston.data)

    output = pd.DataFrame(boston.target)

    return [bos, output]

xS, yS = load_data()


x_train, x_test, y_train, y_test = train_test_split(xS, yS, test_size=0.2)
m = len(y_train)

scaler = MinMaxScaler()
scaler.fit(x_train)
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

theta = tf.Variable(tf.zeros([len(xS.columns), 1]))
X = tf.placeholder(tf.float32, shape=[m, len(xS.columns)])
y = tf.placeholder(tf.float32, shape=[m, 1])
b = tf.Variable(tf.zeros([1]))

model = tf.matmul(X, theta) + b

cost = tf.reduce_sum(tf.square(y-model))/(2*m)

optimizer = tf.train.GradientDescentOptimizer(rate).minimize(cost)

init = [tf.global_variables_initializer(), tf.local_variables_initializer()]

with tf.Session() as sess:
    sess.run(init)
    for e in range(epochs):
        sess.run(optimizer, feed_dict={X:x_train, y:y_train})
        loss = sess.run(cost, feed_dict={X:x_train, y:y_train})
        print("cost at step", e, loss)
        errors.append(loss)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM