简体   繁体   English

tf.contrib.layer.fully_connected, tf.layers.dense, tf.contrib.slim.fully_connected, tf.keras.layers.Dense 之间的不一致

[英]Inconsistencies between tf.contrib.layer.fully_connected, tf.layers.dense, tf.contrib.slim.fully_connected, tf.keras.layers.Dense

I am trying to implement policy gradient for a contextual bandit problem ( https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-1-5-contextual-bandits-bff01d1aad9c ).我正在尝试为上下文强盗问题实施策略梯度( https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-1-5-contextual-bandits-bff01d1aad9c )。

I am defining a model in tensorflow to solve this problem using a single fully-connected layer.我正在 tensorflow 中定义一个模型来使用单个全连接层解决这个问题。

I am trying out different APIs from tensorflow, but want to avoid using the contrib package since it is not tensorflow-supported.我正在尝试来自 tensorflow 的不同 API,但希望避免使用contrib包,因为它不受 tensorflow 支持。 I am interested in using the keras API since I am already familiar with the functional interface, and it is now implemented as tf.keras .我对使用keras API 很感兴趣,因为我已经熟悉了函数式接口,现在它被实现为tf.keras However, I can only seem to get results to work when using tf.contrib.slim.fully_connected , or tf.contrib.layers.fully_connected (the former calls the latter).但是,我似乎只能在使用tf.contrib.slim.fully_connectedtf.contrib.layers.fully_connected (前者称为后者)时才能得到结果。

The following two snippets work correctly ( one_hot_encoded_state_input and num_actions both adhere to the expected tensor shapes for the layers).以下两个片段正常工作( one_hot_encoded_state_inputnum_actions都符合层的预期张量形状)。

import tensorflow.contrib.slim as slim
action_probability_distribution = slim.fully_connected(
    one_hot_encoded_state_input, \
    num_actions, \     
    biases_initializer=None, \
    activation_fn=tf.nn.sigmoid, \
    weights_initializer=tf.ones_initializer())

and

from tensorflow.contrib.layers import fully_connected
action_probability_distribution = fully_connected(
    one_hot_encoded_state_input,
    num_actions,\
    biases_initializer=None, \
    activation_fn=tf.nn.sigmoid, \
    weights_initializer=tf.ones_initializer())

On the other hand, neither of the following work:另一方面,以下都不起作用:

action_probability_distribution = tf.layers.dense(
    one_hot_encoded_state_input, \
    num_actions, \
    activation=tf.nn.sigmoid, \
    bias_initializer=None, \
    kernel_initializer=tf.ones_initializer())

nor也不

action_probability_distribution = tf.keras.layers.Dense(
    num_actions, \
    activation='sigmoid', \
    bias_initializer=None, \
    kernel_initializer = 'Ones')(one_hot_encoded_state_input)

The last two cases use tensorflow's high level APIs layers and keras .最后两种情况使用 tensorflow 的高级 API layerskeras Ideally, I would like to know if I am incorrectly implementing the first two cases using the last two cases , and if the only issue I am having is that the latter two are not equivalent to the former two .理想情况下,我想知道是否使用后两种情况错误地实现了前两种情况,并且我遇到的唯一问题是后两种情况不等同于前两种情况

For completeness, here is the entire code needed to run this (Note: python 3.5.6 and tensorflow 1.12.0 were used).为了完整起见,这里是运行它所需的全部代码(注意:使用了 python 3.5.6 和 tensorflow 1.12.0)。

import tensorflow as tf
import numpy as np
tf.reset_default_graph()

num_states = 3
num_actions = 4
learning_rate = 1e-3

state_input = tf.placeholder(shape=(None,),dtype=tf.int32, name='state_input')
one_hot_encoded_state_input = tf.one_hot(state_input, num_states)

# DOESN'T WORK
action_probability_distribution = tf.keras.layers.Dense(num_actions, activation='sigmoid', bias_initializer=None, kernel_initializer = 'Ones')(one_hot_encoded_state_input)

# WORKS
# import tensorflow.contrib.slim as slim
# action_probability_distribution = slim.fully_connected(one_hot_encoded_state_input,num_actions,\
#     biases_initializer=None,activation_fn=tf.nn.sigmoid,weights_initializer=tf.ones_initializer())

# WORKS
# from tensorflow.contrib.layers import fully_connected
# action_probability_distribution = fully_connected(one_hot_encoded_state_input,num_actions,\
#     biases_initializer=None,activation_fn=tf.nn.sigmoid,weights_initializer=tf.ones_initializer())

# DOESN'T WORK
# action_probability_distribution = tf.layers.dense(one_hot_encoded_state_input,num_actions, activation=tf.nn.sigmoid, bias_initializer=None, kernel_initializer=tf.ones_initializer())

action_probability_distribution = tf.squeeze(action_probability_distribution)
action_chosen = tf.argmax(action_probability_distribution)

reward_input = tf.placeholder(shape=(None,), dtype=tf.float32, name='reward_input')
action_input = tf.placeholder(shape=(None,), dtype=tf.int32, name='action_input')
responsible_weight = tf.slice(action_probability_distribution, action_input, [1])
loss = -(tf.log(responsible_weight)*reward_input)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
update = optimizer.minimize(loss)


bandits = np.array([[0.2,0,-0.0,-5],
                    [0.1,-5,1,0.25],
                    [-5,5,5,5]])

assert bandits.shape == (num_states, num_actions)

def get_reward(state, action): # the lower the value of bandits[state][action], the higher the likelihood of reward
    if np.random.randn() > bandits[state][action]:
        return 1
    return -1

max_episodes = 10000
epsilon = 0.1

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    rewards = np.zeros(num_states)
    for episode in range(max_episodes):
        state = np.random.randint(0,num_states)
        action = sess.run(action_chosen, feed_dict={state_input:[state]})
        if np.random.rand(1) < epsilon:
            action = np.random.randint(0, num_actions)

        reward = get_reward(state, action)
        sess.run([update, action_probability_distribution, loss], feed_dict = {reward_input: [reward], action_input: [action], state_input: [state]})

        rewards[state] += reward

        if episode%500 == 0:
            print(rewards)

When using the chunks commented # THIS WORKS , the agent learns and maximizes reward across all three states.当使用注释为# THIS WORKS的块时,代理学习并最大化所有三个状态的奖励。 On the other hand, those commented # THIS DOESN'T WORK# don't learn and typically converge extremely quickly to choosing one action.另一方面,那些评论# THIS DOESN'T WORK#人不会学习并且通常会极快地收敛到选择一个动作。 For example, working behaviour should print a reward list that is positive, increasing numbers (good cumulative reward for each state).例如,工作行为应该打印一个积极的、递增的数字reward列表(每个状态良好的累积奖励)。 non-working behaviour looks like a reward list that has only one action with increasing cumulative reward, usually sacrificing the other (negative cumulative reward).非工作行为看起来像一个reward列表,其中只有一个行为随着累积奖励的增加而增加,通常会牺牲另一个(负累积奖励)。

For anyone who runs into this issue, especially since tensorflow has many APIs for implementation, the difference comes down to bias initialization and defaults.对于遇到此问题的任何人,尤其是因为 tensorflow 有许多用于实现的 API,差异归结为偏差初始化和默认值。 For tf.contrib and tf.slim , using biases_initializer = None means that no bias is used.对于tf.contribtf.slim ,使用biases_initializer = None意味着不使用任何偏差。 Replicating this using tf.layers and tf.keras requires use_bias=False .使用tf.layerstf.keras复制它需要use_bias=False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM