[英]AttributeError: 'Tensor' object has no attribute 'append'
我不知道為什么此代碼無法正常工作。 當我將獎勵放入列表時,出現錯誤告訴我尺寸不正確。 我不知道該怎么做。
我正在實施加強型深度Q網絡。 r是一個numpy 2d數組,給出1除以停止點之間的距離。 這樣一來,越近的停靠點將獲得越高的獎勵。
不管我做什么,我都無法獲得平穩運行的獎勵。 我是Tensorflow的新手,所以這可能是由於我對Tensorflow占位符和feed dict之類的經驗不足所致。
在此先感謝您的幫助。
observations = tf.placeholder('float32', shape=[None, num_stops])
game states : r[stop], r[next_stop], r[third_stop]
actions = tf.placeholder('int32',shape=[None])
rewards = tf.placeholder('float32',shape=[None]) # +1, -1 with discounts
Y = tf.layers.dense(observations, 200, activation=tf.nn.relu)
Ylogits = tf.layers.dense(Y, num_stops)
sample_op = tf.random.categorical(logits=Ylogits, num_samples=1)
cross_entropies = tf.losses.softmax_cross_entropy(onehot_labels=tf.one_hot (actions,num_stops), logits=Ylogits)
loss = tf.reduce_sum(rewards * cross_entropies)
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001, decay=.99)
train_op = optimizer.minimize(loss)
visited_stops = []
steps = 0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Start at a random stop, initialize done to false
current_stop = random.randint(0, len(r) - 1)
done = False
# reset everything
while not done: # play a game in x steps
observations_list = []
actions_list = []
rewards_list = []
# List all stops and their scores
observation = r[current_stop]
# Add the stop to a list of non-visited stops if it isn't
# already there
if current_stop not in visited_stops:
visited_stops.append(current_stop)
# decide where to go
action = sess.run(sample_op, feed_dict={observations: [observation]})
# play it, output next state, reward if we got a point, and whether the game is over
#game_state, reward, done, info = pong_sim.step(action)
new_stop = int(action)
reward = r[current_stop][action]
if len(visited_stops) == num_stops:
done = True
if steps >= BATCH_SIZE:
done = True
steps += 1
observations_list.append(observation)
actions_list.append(action)
rewards.append(reward)
#rewards_list = np.reshape(rewards, [-1, 25])
current_stop = new_stop
#processed_rewards = discount_rewards(rewards, args.gamma)
#processed_rewards = normalize_rewards(rewards, args.gamma)
print(rewards)
sess.run(train_op, feed_dict={observations: [observations_list],
actions: [actions_list],
rewards: [rewards_list]})
行rewards.append(reward)
導致錯誤,一個是因為你rewards
變量是一個張量,只要在限定它rewards = tf.placeholder('float32',shape=[None])
和你不能追加值像這樣張量 您可能想調用rewards_list.append(reward)
。
另外,您正在初始化變量
observations_list = []
actions_list = []
rewards_list = []
在循環內,因此在每次迭代中,ols值將被空列表覆蓋。 您可能希望while not done:
之前先獲得這3行while not done:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.