简体   繁体   中英

Reinforcement Learning or Supervised Learning?

如果在强化学习 (RL) 算法在现实世界中工作之前需要在模拟环境中进行大量迭代,为什么我们不使用相同的模拟环境来生成标记数据,然后使用监督学习方法而不是 RL?

The reason is because the two fields has a fundamental difference:

One tries to replicate previous results and the other tries to be better than previous results.

There are 4 fields in machine learning:

  • Supervised learning
  • Unsupervised Learning
  • Semi-supervised Learning
  • Reinforcement learning

Let's talking about the two fields you asked for, and let's intuitively explore them with a real life example of archery.

Supervised Learning

For supervised learning, we would observe a master archer in action for maybe a week and record how far they pulled the bow string back, angle of shot, etc. And then we go home and build a model. In the most ideal scenario , our model becomes equally as good as the master archer. It cannot get better because the loss function in supervised learning is usually MSE or Cross entropy, so we simply try to replicate the feature label mapping. After building the model, we deploy it. And let's just say we're extra fancy and make it learn online. So we continually take data from the master archer and continue to learn to be exactly the same as the master archer.

The biggest takeaway:

We're trying to replicate the master archer simply because we think he is the best. Therefore we can never beat him.

Reinforcement Learning

In reinforcement learning, we simply build a model and let it try many different things. And we give it a reward / penalty depending on how far the arrow was from the bullseye. We are not trying to replicate any behaviour, instead, we try to find our own optimal behaviour. Because of this, we are not given any bias towards what we think the optimal shooting strategy is.

Because RL does not have any prior knowledge, it may be difficult for RL to converge on difficult problems. Therefore, there is a method called apprenticeship learning / imitation learning, where we basically give the RL some trajectories of master archers just so it can have a starting point and begin to converge. But after that, RL will explore by taking random actions sometimes to try to find other optimal solutions. This is something that supervised learning cannot do. Because if you explore using supervised learning, you are basically saying by taking this action in this state is optimal. Then you try to make your model replicate it. But this scenario is wrong in supervised learning, and should instead be seen as an outlier in the data.

Key differences of Supervised learning vs RL:

  • Supervised Learning replicates what's already done
  • Reinforcement learning can explore the state space, and do random actions. This then allows RL to be potentially better than the current best.

Why we don't use the same simulated environment to generate the labeled data and then use supervised learning methods instead of RL

We do this for Deep RL because it has an experience replay buffer. But this is not possible for supervised learning because the concept of reward is lacking.

Example: Walking in a maze.

Reinforcement Learning

Taking a right in square 3: Reward = 5

Taking a left in square 3: Reward = 0

Taking a up in square 3: Reward = -5

Supervised Learning

Taking a right in square 3

Taking a left in square 3

Taking a up in square 3

When you try to make a decision in square 3, RL will know to go right. Supervised learning will be confused, because in one example, your data said to take a right in square 3, 2nd example says to take left, 3rd example says to go up. So it will never converge.

In short, supervised learning is passive learning, that is, all the data is collected before you start training your model.

However, reinforcement learning is active learning. In RL, usually, you don't have much data at first and you collect new data as you are training your model. Your RL algorithm and model decide what specific data samples you can collect while training.

In supervised learning we have target labelled data which is assumed to be correct.

In RL that's not the case we have nothing but rewards. Agents needs to figure itself which action to take by playing with the environment while observing the rewards it gets.

Supervised Learning is about the generalization of the knowledge given by the supervisor (training data) to use in an uncharted area (test data). It is based on instructive feedback where the agent is provided with correct actions (labels) to take given a situation (features).

Reinforcement Learning is about learning through interaction by trial-and-error. There is no instructive feedback but only evaluative feedback that evaluates the action taken by an agent by informing how good the action taken was instead of saying the correct action to take.

Reinforcement learning is an area of Machine Learning. It is about taking suitable action to maximize reward in a particular situation. It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. Reinforcement learning differs from supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of a training data set, it is bound to learn from its experience.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM