简体   繁体   English

Tensorflow:如何将 conv 层权重复制到另一个变量以用于强化学习?

[英]Tensorflow: How to copy conv layer weights to another variable for use in reinforcement learning?

I'm not sure if this is possible in Tensorflow and I'm concerned I may have to switch over to PyTorch.我不确定这在 Tensorflow 中是否可行,我担心我可能不得不切换到 PyTorch。

Basically, I have this layer:基本上,我有这一层:

self.policy_conv1 = tf.layers.conv2d(inputs=self.policy_s, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer)

Which I'm trying to copy into another layer every 100 iterations of training or so:我试图每 100 次左右的训练迭代将其复制到另一层:

self.eval_conv1 = tf.layers.conv2d(inputs=self.s, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid', activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer)

tf.assign doesn't seem to be the right tool, and the following doesn't seem to work: tf.assign似乎不是正确的工具,以下似乎不起作用:

self.policy_conv1 = tf.stop_gradient(tf.identity(self.eval_conv1))

Essentially, I am looking to copy over the eval conv layer into the policy conv layer, and not have them tied together each time the graph runs one variable or the other (which is occurring with the identity snippet above).本质上,我希望将 eval conv 层复制到 policy conv 层,而不是在每次图形运行一个变量或另一个变量时将它们捆绑在一起(这与上面的标识片段一起发生)。 If someone can point me to the needed code, I would appreciate it.如果有人能指出我需要的代码,我将不胜感激。

import numpy as np
import tensorflow as tf

# I'm using placeholders, but it'll work for other inputs as well
ph1 = tf.placeholder(tf.float32, [None, 32, 32, 3])
ph2 = tf.placeholder(tf.float32, [None, 32, 32, 3])

l1 = tf.layers.conv2d(inputs=ph1, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer, name="layer_1")
l2 = tf.layers.conv2d(inputs=ph2, filters=16, kernel_size=(8,8),strides=(4,4), padding = 'valid',activation=tf.nn.relu, kernel_initializer=tf.glorot_uniform_initializer, bias_initializer = tf.glorot_uniform_initializer, name="layer_2")

sess = tf.Session()
sess.run(tf.global_variables_initializer())

w1 = tf.get_default_graph().get_tensor_by_name("layer_1/kernel:0")
w2 = tf.get_default_graph().get_tensor_by_name("layer_2/kernel:0")

w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # non-zero

sess.run(tf.assign(w2, w1))
w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # 0

w1 = w1 * 2 + 1
w1_r = sess.run(w1)
w2_r = sess.run(w2)
print(np.sum(w1_r - w2_r)) # non-zero

layer_1/bias:0 should work for getting the bias terms. layer_1/bias:0应该用于获取偏差项。

UPDATE:更新:

I found an easier way:我找到了一个更简单的方法:

update_weights = [tf.assign(new, old) for (new, old) in 
   zip(tf.trainable_variables('new_scope'), tf.trainable_vars('old_scope'))]

Doing a sess.run on update_weights should copy the weights from one network to the other.sess.run上执行update_weights应该将权重从一个网络复制到另一个网络。 Just remember to build them under separate name scopes.请记住在单独的名称范围下构建它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM