简体繁体 English

为什么 RL 被称为“强化”学习？

[英]Why is RL called 'reinforcement' learning?

原文 2018-05-28 00:03:38 6 3 machine-learning/ deep-learning/ reinforcement-learning

I understand why machine learning is named as such, and on top of that the nomenclature behind supervised and unsupervised learning.我理解为什么机器学习被这样命名，最重要的是监督和无监督学习背后的命名法。 So what is reinforced about reinforcement learning?那么，什么是增强对强化学习？

3 个解决方案

The “reinforcement” in reinforcement learning refers to how certain behaviors are encouraged, and others discouraged.强化学习中的“强化”是指如何鼓励某些行为，以及如何阻止其他行为。 Behaviors are reinforced through rewards which are gained through experiences with the environment.行为通过从环境体验中获得的奖励而得到强化。

Modern reinforcement learning is built upon two main threads.现代强化学习建立在两条主线之上。 One thread concerns learning by trial and error and originated in the psychology of animal learning .一条线索涉及通过试错法学习，起源于动物学习心理学。 The second thread concerns the problem of optimal control, and it is a solution using value functions and dynamic programming ( Sutton and Barto., 2018).第二个线程涉及最优控制问题，它是使用值函数和动态规划的解决方案（Sutton and Barto., 2018）。 Reinforcement learning borrowed his name from the first thread of studies.强化学习从第一个研究线程中借用了他的名字。 According to Watkins (1989), in studying the animals' ability to learn, the animals may be automatically provided with reinforcers .根据 Watkins (1989)，在研究动物的学习能力时，动物可能会自动获得强化物。 In behavioral terms, a positive reinforcer might be a morsel of food for a hungry animal, for instance, or a sip of water for a thirsty animal.例如，在行为方面，正强化物可能是饥饿动物的一小口食物，或者是口渴动物的一小口水。 Conversely, a negative reinforcer might be an electric shock.相反，负强化可能是电击。

PS.附注。 Watkins proposed the Q-learning algorithm. Watkins 提出了 Q-learning 算法。

Edit: (Added more history)编辑：（添加更多历史记录）

According to Sutton and Barto (2018): "The term “ reinforcement ” in the context of animal learning came into use well after Thorndike's expression of the Law of Effect, first appearing in this context (to the best of our knowledge) in the 1927 English translation of Pavlov's monograph on conditioned reflexes. Pavlov described reinforcement as the strengthening of a pattern of behavior due to an animal receiving a stimulus—a reinforcer—in an appropriate temporal relationship with another stimulus or with a response."根据 Sutton 和 Barto（2018 年）的说法：“动物学习背景下的“强化”一词在桑代克（Thorndike）表达效果定律之后很早就开始使用，第一次出现在这种背景下（据我们所知）是在 1927 年巴甫洛夫关于条件反射的专着的英文翻译。巴甫洛夫将强化描述为由于动物接受刺激——强化物——与另一种刺激或反应具有适当的时间关系而强化一种行为模式。”

Sutton, Richard S., and Andrew G. Barto. Sutton、Richard S. 和 Andrew G. Barto。 Reinforcement learning: An introduction.强化学习：简介。 MIT press, 2018.麻省理工学院出版社，2018 年。
Thorndike, EL Animal Intelligence. Thorndike，EL 动物智能。 Hafner, Darien, CT, 1911.哈夫纳，达里安，康涅狄格州，1911 年。
Watkins, Christopher John Cornish Hellaby.沃特金斯，克里斯托弗·约翰·康尼什·海拉比。 "Learning from delayed rewards." “从延迟奖励中学习。” (1989). (1989)。

Reinforcement learning is reinforced through trial and error.强化学习是通过反复试验来强化的。 Outcomes which are incorrect (or less than optimal) do not need to be manually corrected.不正确（或低于最佳）的结果不需要手动更正。 Instead, the focus is on exploration, and feedback (reinforcement) is obtained from these same experiences.相反，重点是探索，并从这些相同的经历中获得反馈（强化）。