简体   繁体   English

如何在 Tensor Flow 中调整 tf-agents 和策略的超参数?

[英]How to tune hyperparameters of tf-agents and policies in Tensor Flow?

I have set up a python environment that is wrapped in a tensor flow class to make it a tensor flow environment.我已经建立了一个包含在张量流类中的 python 环境,使其成为一个张量流环境。 Then I set up the learning as per the collab notebooks listedhere .然后我按照此处列出的协作笔记本设置学习。 Currently, I am using the dqn and REINFORCE agent.目前,我正在使用 dqn 和 REINFORCE 代理。

The setup works well and the results are somewhat as expected.设置运行良好,结果有点符合预期。 Now I want to go into the tuning of the hyperparameters like decaying epsilon greedy, weights etc.现在我想调整超参数,例如衰减的 epsilon greedy、权重等。

I need some pointers on how to use the documentation on how to access these hyperparameters.我需要一些有关如何使用有关如何访问这些超参数的文档的指示。

Reinforce doesn't not support epsilon greedy policy, I suggest switching to DQN agent or DDQN. Reinforce 不支持 epsilon 贪婪策略,我建议切换到 DQN 代理或 DDQN。

To pass a specified Q-Network you can use something like:要传递指定的 Q-Network,您可以使用以下内容:

q_network=q_network.QNetwork(
        environment.time_step_spec().observation['observations'],
        environment.action_spec(),
        fc_layer_params=fc_layer_params)

and pass that to your agent on initialization.并在初始化时将其传递给您的代理。 For a decaying epsilon-greedy policy you can define your own function decaying_epsilon(train_step, *kwargs) as you prefer.对于衰减的 epsilon-greedy 策略,您可以根据自己的decaying_epsilon(train_step, *kwargs)定义自己的函数decaying_epsilon(train_step, *kwargs) Then initialize your train_step tensor and pass it through functools.partial like this:然后初始化你的 train_step 张量并通过functools.partial像这样传递它:

train_step = tf.Variable(0, trainable=False, name='global_step', dtype=tf.int64)
partial_decaying_eps = partial(decaying_epsilon, train_step *kwargs)

You can now pass partial_decaying_eps to your agent and it will work as expected, updating with your train_step Tensor progressively.您现在可以将partial_decaying_eps传递给您的代理,它会按预期工作,并逐步更新您的train_step张量。 Be sure to pass this same train_step Tensor to your agent though.不过,请务必将相同的train_step张量传递给您的代理。

Other HP can be modified easily, just look at the DQN documentation in its __init__ function其他HP可以轻松修改,只需查看其__init__函数中的DQN文档即可

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 anaconda 中安装 tf-agents - How to install tf-agents in anaconda 如何在批量学习中为 tf-agents 定义正确的形状 - How to define correct shape for tf-agents in batch learning 如何获得 tf-agents 中所有动作的概率向量? - How to get probability vector for all actions in tf-agents? 如何在 Tf-agents 中传递自定义环境的批量大小 - How to pass the batchsize for a custom environment in Tf-agents 使用 PyEnvironment TF-Agents 包装自定义健身房 - Wrapping custom gym with PyEnvironment TF-Agents Tf-Agents ParallelPyEnvironment静默失败 - Tf-Agents ParallelPyEnvironment fails silently 如何修复 TF-Agents 中 policy_state 和 policy_state_spec 之间的 TypeError? - How to fix a TypeError between policy_state and policy_state_spec in TF-Agents? 如何在 python 的大查询中存储 tf-agents 的轨迹对象并将其作为轨迹对象检索回来 - How to store tf-agents' trajectory object in big query from python and retrieve it back as the trajectory object 如何在 step 方法中将动作元组提供给 TF-Agents 环境? - How to give a tuple of actions to a TF-Agents environment within the step method? TF-agents - 重播缓冲区将轨迹添加到批次形状不匹配 - TF-agents - Replay buffer add trajectory to batch shape mismatch
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM