q 代理正在学习不采取任何行动

Question

I'm training a deep q network to trade stocks;我正在训练一个深度 q 网络来交易股票； it has two possible actions;它有两种可能的动作； 0: wait, 1: buy stock if one isn't bought, sell one if one is bought. 0：等待，1：没买就买，买了就卖。 It gets, as input, the value of the stock it bought, the current value of the stock and the values of the stock for the previous 5 time steps relative to it.作为输入，它获取所购买股票的价值、股票的当前价值以及与股票相关的前 5 个时间步长的股票价值。 So something like所以像

[5.78, 5.93, -0.1, -0.2, -0.4, -0.5, -0.3]

The reward is simply the difference between the price of the sale and the price of the purchase.奖励只是销售价格和购买价格之间的差额。 The reward for any other action is 0, though I've tried having it be negative or something else without results.任何其他动作的奖励都是 0，尽管我尝试过让它为负数或其他没有结果的东西。

simple, right?很简单，对吧？ Unfortunately, the agent always converges on taking the "0" action.不幸的是，代理总是收敛于采取“0”行动。 Even when I magnify the reward for selling at a profit or any number of things.即使我放大了以盈利或任何数量出售的回报。 I'm really pulling my hair out, is there something obvious I've missed?我真的把头发拉出来了，有什么明显的我错过了吗？

Answer 1

Although something was probably broken with the agent itself, the second agent I wrote exhibited similar behavior.尽管代理本身可能有问题，但我编写的第二个代理表现出类似的行为。 I finally solved the issue by decreasing the learning rate;我终于通过降低学习率解决了这个问题； in the end it had to be about a thousand times lower than it was最后它必须比原来低一千倍

q 代理正在学习不采取任何行动

问题描述

1 个解决方案

解决方案1
0 2020-05-31 10:37:35

q 代理正在学习不采取任何行动

问题描述

1 个解决方案

解决方案1 0 2020-05-31 10:37:35

解决方案1
0 2020-05-31 10:37:35