梯度下降的线性回归：两个问题

Question

I'm trying to understand Linear Regression with Gradient Descent and I do not understand this part in my loss_gradients function below.我试图理解梯度下降的线性回归，但我在下面的loss_gradients function 中不理解这部分。

import numpy as np

def forward_linear_regression(X, y, weights):

    # dot product weights * inputs
    N = np.dot(X, weights['W'])

    # add bias
    P = N + weights['B']

    # compute loss with MSE
    loss = np.mean(np.power(y - P, 2))

    forward_info = {}
    forward_info['X'] = X
    forward_info['N'] = N
    forward_info['P'] = P
    forward_info['y'] = y

    return loss, forward_info

Here is where I'm stuck in my understanding, I have commented out my questions:这是我的理解陷入困境的地方，我已经注释掉了我的问题：

def loss_gradients(forward_info, weights):

    # to update weights, we need: dLdW = dLdP * dPdN * dNdW
    dLdP = -2 * (forward_info['y'] - forward_info['P'])
    dPdN = np.ones_like(forward_info['N'])
    dNdW = np.transpose(forward_info['X'], (1, 0))

    dLdW = np.dot(dNdW, dLdP * dPdN)
    # why do we mix matrix multiplication and dot product like this?
    # Why not dLdP * dPdN * dNdW instead?

    # to update biases, we need: dLdB = dLdP * dPdB
    dPdB = np.ones_like(forward_info[weights['B']])
    dLdB = np.sum(dLdP * dPdB, axis=0)
    # why do we sum those values along axis 0?
    # why not just dLdP * dPdB ?

Answer 1

It looks to me like this code is expecting a 'batch' of data.在我看来，这段代码需要“一批”数据。 What I mean by that is, it's expecting that when you do forward_info and loss_gradients , you're actually passing a bunch of (X, y) pairs together.我的意思是，当您执行forward_info和loss_gradients时，您实际上是在传递一堆 (X, y) 对。 Let's say you pass B such pairs.假设你通过了 B 这样的对。 The first dimension of all of your forward info stuff will have size B.您所有转发信息的第一个维度将具有大小 B。

Now, the answers to both of your questions are the same: essentially, these lines compute the gradients (using the formulas you predicted) for each of the B terms , and then sum up all of the gradients so you get one gradient update.现在，您的两个问题的答案是相同的：本质上，这些行计算每个 B 项的梯度（使用您预测的公式），然后对所有梯度求和，以便获得一个梯度更新。 I encourage you to work out the logic behind the dot product yourself, because this is a very common pattern in ML, but it's a little tricky to get the hang of at first.我鼓励您自己弄清楚点积背后的逻辑，因为这是 ML 中非常常见的模式，但一开始要掌握窍门有点棘手。

梯度下降的线性回归：两个问题

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-15 14:50:12

梯度下降的线性回归：两个问题

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-15 14:50:12

解决方案1
1 已采纳 2020-06-15 14:50:12