簡體   English   中英

培訓課程結束后未更新TensorFlow Hessian矩陣

[英]TensorFlow Hessian matrix is not updated after training session

我正在嘗試使用tf.hessians函數獲取黑森州矩陣。 損失值和變量在每次訓練后都會更新,而Hessian矩陣值則保持不變。 而且,它們不依賴於可以手動設置的初始變量值。 實際上,我的問題與類似,尚未收到任何答案。 這是我用於測試的代碼:

import tensorflow as tf

# Model parameters
W = tf.Variable([.3], dtype=tf.float32)
b = tf.Variable([-.3], dtype=tf.float32)

# Model input and output
x = tf.placeholder(tf.float32)
linear_model = W*x + b
y = tf.placeholder(tf.float32)

# loss
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares

# optimizer
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

# training data
x_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]

hess = tf.hessians(loss, [W, b])

# training loop
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(10):
    sess.run(train, {x: x_train, y: y_train})
    cur_hess, curr_W, curr_b, curr_loss = sess.run([hess, W, b, loss], {x: x_train, y: y_train})
    print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
    print('cur_hess', cur_hess)

下面是打印結果:

W: [-0.21999997] b: [-0.456] loss: 4.0181446
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.39679998] b: [-0.49552] loss: 1.8198745
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.459616] b: [-0.4965184] loss: 1.5448234
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.48454273] b: [-0.48487374] loss: 1.4825068
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.49684232] b: [-0.4691753] loss: 1.444397
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.5049019] b: [-0.45227283] loss: 1.409699
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.5115062] b: [-0.43511063] loss: 1.3761029
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.51758033] b: [-0.41800055] loss: 1.3433373
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.523432] b: [-0.40104443] loss: 1.3113549
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]
W: [-0.52916396] b: [-0.38427448] loss: 1.2801344
cur_hess [array([[60.]], dtype=float32), array([[8.]], dtype=float32)]

因此,cur_hess不會更新,並且順便說一下,它僅包含2個元素而不是4個元素。如何解決? 我也嘗試按照這里的建議兩次應用tf.gradients,但是值不會像tf.hessians那樣更新。 同時,tf.gradients會正確計算一階導數,並且在每次訓練循環后都會對其進行更改。 謝謝。

在這種情況下具有恆定的粗麻布是正常的,因為,

loss = Σ [(Wx + b - y)^2]

該方程為二次方程,二次方程的二階導數為常數。

∂2(loss)/∂W2 = Σ 2x^2 = 2 * (1 + 4 + 9 + 16) = 60 ;(x = [1,2,3,4])

∂2(loss)/∂b2 = Σ 2 = 2 + 2 + 2 + 2 = 8 ;(4 samples with constant derivative)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM