为什么在使用 TensorFlow 进行多元线性回归时会得到不同的权重？

Question

I have two implementations of multiple linear regressions, one using tensorflow and one using only numpy .我有两种多元线性回归的实现，一种使用tensorflow ，一种仅使用numpy 。 I generate a dummy set of data and I try to recover the weights I used, but although the numpy one returns the initial weights, the tensorflow one always returns different weights (which also sort of work)我生成了一组虚拟数据并尝试恢复我使用的权重，但是尽管numpy返回初始权重，但tensorflow总是返回不同的权重（这也是一种工作）

The numpy implementation is here , and here's the TF implementation: numpy实现在这里，这里是 TF 实现：

import numpy as np
import tensorflow as tf

x = np.array([[i, i + 10] for i in range(100)]).astype(np.float32)
y = np.array([i * 0.4 + j * 0.9 + 1 for i, j in x]).astype(np.float32)

# Add bias
x = np.hstack((x, np.ones((x.shape[0], 1)))).astype(np.float32)

# Create variable for weights
n_features = x.shape[1]
np.random.rand(n_features)
w = tf.Variable(tf.random_normal([n_features, 1]))
w = tf.Print(w, [w])

# Loss function
y_hat = tf.matmul(x, w)
loss = tf.reduce_mean(tf.square(tf.sub(y, y_hat)))

operation = tf.train.GradientDescentOptimizer(learning_rate=0.000001).minimize(loss)

with tf.Session() as session:
    session.run(tf.initialize_all_variables())
    for iteration in range(5000):
        session.run(operation)
    weights = w.eval()
    print(weights)

Running the script gets me weights around [-0.481, 1.403, 0.701] , while running the numpy version gets me weights around [0.392, 0.907, 0.9288] which are much closer to the weights I used to generate the data: [0.4, 0.9, 1]运行脚本让我得到大约[-0.481, 1.403, 0.701]权重，而运行numpy版本让我得到大约[0.392, 0.907, 0.9288]的权重[0.392, 0.907, 0.9288]这更接近我用来生成数据的权重： [0.4, 0.9, 1]

Both learning rates/epochs parameters are the same, and both initialise weights randomly.两个学习率/时期参数相同，并且都随机初始化权重。 I don't normalize the data for either of the implementations, and I've ran them multiple times.我没有对任何一个实现的数据进行标准化，而且我已经多次运行它们。

Why are the results different?为什么结果不同？ I also tried to initialise weights in the TF version using w = tf.Variable(np.random.rand(n_features).reshape(n_features,1).astype(np.float32)) but that didn't fix it either.我还尝试使用w = tf.Variable(np.random.rand(n_features).reshape(n_features,1).astype(np.float32))初始化 TF 版本中的权重，但这也没有解决。 Is there something wrong with the TF implementation? TF 实现有问题吗？

Answer 1

The problem appears to be with broadcasting.问题似乎出在广播上。 The shape of y_hat in the above is (100,1) , while y is (100,) .上面y_hat的形状是(100,1) ，而y是(100,) 。 So, when you do tf.sub(y, y_hat) you end up with a matrix of (100,100) which are all the possible combinations of subtractions between the two vectors.因此，当您执行tf.sub(y, y_hat)您最终会得到一个(100,100)矩阵，它们是两个向量之间所有可能的减法组合。 I don't know, but I guess that you managed to avoid this in the numpy code.我不知道，但我猜你在 numpy 代码中设法避免了这种情况。

Two ways to fix your code:修复代码的两种方法：

y = np.array([[i * 0.4 + j * 0.9 + 1 for i, j in x]]).astype(np.float32).T

or或者

y_hat = tf.squeeze(tf.matmul(x, w))

Although, that said, when I run this it still doesn't actually converge to the answer you want, but at least it's actually able to minimize the loss function.虽然，也就是说，当我运行它时，它实际上仍然没有收敛到你想要的答案，但至少它实际上能够最小化损失函数。

为什么在使用 TensorFlow 进行多元线性回归时会得到不同的权重？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-10-20 16:11:27

为什么在使用 TensorFlow 进行多元线性回归时会得到不同的权重？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-10-20 16:11:27

解决方案1
1 已采纳 2017-10-20 16:11:27