简体   繁体   English

强化学习文献中的“软”是什么意思?

[英]What does "soft" in reinforcement learning literature mean?

I have noticed that some papers refers to soft agents, and I thought that it was just an agent where the entropy is included in the objective function of the policy network.我注意到一些论文提到了软代理,我认为它只是一个熵包含在策略网络目标函数中的代理。 But now I'm not sure anymore.. Can anyone confirm or offer another explanation?但是现在我不确定了.. 任何人都可以确认或提供其他解释吗?

So, it seems like this is the case.所以,好像是这样的。 Asked about where the entropy enters SAC on ai.stackexchange and got a good answer for those interested.ai.stackexchange上被问及熵从哪里进入 SAC,感兴趣的人得到了很好的答案。

An epsilon-soft policy is a policy that takes every action with a probability of at least epsilon in every state. epsilon-soft 策略是一种在每个状态下以至少 epsilon 的概率采取每个动作的策略。 (Source http://incompleteideas.net/sutton/book/RLbook2018.pdf exercise 4.6 page 82 (104 of the pdf)). (来源http://incompleteideas.net/sutton/book/RLbook2018.pdf练习 4.6 第 82 页(pdf 的 104))。 A soft policy is a policy that takes every action with positive probability (page 100/122).软策略是一种以正概率采取每个动作的策略(第 100/122 页)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM