Tensorflow tf.GradientTape() 應該只使用 Tf.variables？

Question

我正在嘗試使用 tensorflow 編寫一個強化學習代理。我想知道狀態是否應該是 tf.Variables 或者可以是 numpy arrays 用於使用梯度磁帶的反向傳播。 如果我的狀態/動作 arrays 是 numpy 而不是 tensorflow arrays，我不確定梯度是否正確，但我知道損失 function 會返回一個 tf.Variable。 謝謝，我仍然是使用 Tensorflow 的初學者，任何解釋/建議都會有很大幫助。

在一個非常簡化的形式（不是逐字逐句）中，我的代碼看起來像：

with tf.GradientTape as tape:
   
   #actions/states are both lists of np arrays
   action = model.call(state)
   states.append(state)
   actions.append(actions) 

   loss = model.loss(states,actions) #loss returns tf.variable

model.optimizer.apply_gradients(tape.gradient(loss, model.variables)

Answer 1

嗨，Noob :) optimizer.apply_gradients操作將僅更新具有非零梯度的 model tf.Variable （請參閱輸入參數model.variables ）。

參考： https://www.tensorflow.org/api_docs/python/tf/GradientTape

可訓練變量（由 tf.Variable 或 tf.compat.v1.get_variable 創建，其中 trainable=True 在這兩種情況下都是默認值）會被自動監視。 可以通過在此上下文管理器上調用 watch 方法來手動監視張量。

編輯：如果你想調用 model 來做出給定 numpy 數組的預測：這是可能的。 根據文檔， model.call()的輸入應該是一個張量 object。您可以簡單地從 numpy 數組中獲取一個張量，如下所示：

state  # numpy array
tf_state = tf.constant(state)
model.call(tf_state)

當然，您可以先初始化一個（不可訓練的） tf.Variables ，然后用 numpy 數組的值更新它的值，而不是為訓練循環的每次迭代創建新的tf.constant ：如下所示應該管用：

tf_state = tf.Variable(np.zeros_like(state), dtype=tf.float32, trainable=False)
for iter in n_train_iterations:
    state = get_new_numpy_state()
    tf_state.assign(state)
    model.call(tf_state)

Tensorflow tf.GradientTape() 應該只使用 Tf.variables？

問題描述

1 個解決方案

解決方案1
0 2022-05-03 11:44:49

Tensorflow tf.GradientTape() 應該只使用 Tf.variables？

問題描述

1 個解決方案

解決方案1 0 2022-05-03 11:44:49

解決方案1
0 2022-05-03 11:44:49