in Reinforcement Learning

Spread the love

An Introduction to the brief Introduction

Some of you may be confused by the title, for most blogs series or articles are always begin with a ‘real’ introduction. Believe me, I have tried to prepare all the information, which I think they should be known before we learn reinforcement learning algorithms, to make up a really good introduction, but I finally give up for there are such tremendous amounts of aspects to talk about. However, a light bright on me, why not just write a brief one only for the most basic concepts, then at the end present a good survey or an overall summary. So in this article, I will just talk about:
1. What is a Reinforcement Learning
2. Supervised Learning, Unsupervised Learning and Reinforcement Learning
3. Some Basic Concepts

What Is a Reinforcement Learning

This is always our first question for this subject, we can found more details from Wikipedia ‘Reinforcement Learning’1. However, here I want to present a more readable interpretation: reinforcement learning is a kind of machine learning, whose purpose is to solve problems by approaching the learning process of human beings or other intelligent creatures through computer programs. This long sentence contains three important views:
1. The purpose is to solve problems which we come across and have not ever been solved by already known methods
2. Most of the reinforcement learning ideas come from psychology and neuroscience
3. We simulate all the conditions and exert our algorithms on a(or more) modern computer(s)

All these might be the best I can introduce to you by speaking English, yet some guys might still be confusing, for all the descriptions above are actually just like a normal machine learning, such as linear regression or even not as powerful as the neural networks. So, let’s show the distinctions between reinforcement learning and the other machine learning algorithms.

By the way, if you have a question about why it is called ‘reinforcement learning’, You can find out the great word ‘reinforcement’ from the psychology research (Schultz W (July 2015)2) or just read the WIKIPEDIA ‘Reinforcement’3

supervised Learning, Unsupervised Learning and Reinforcement Learning

supervised Learning

supervised Learning is almost the most popular set of methods that we have ever heard about machine learning or artificial intelligence. Machine learning is a bigger concept, and it contains supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning and etc.
Linear classification algorithms are a bunch of the most accessible methods of machine learning to begin with. However, what I mentioned here are all about the easiest ones, say, we have already sampled 6 2-D points from a dataset, and each of them belongs to one of two different classes that we called this class information ‘label’. And then our mission is to find a line or a hyperplane to tell the whole dataset apart to two classes. Though these points may be mixed together, we still have the strategy to accomplish our mission. What we need and what we depend on are the 6 already known points and their classes(label), for instance:

Point Name Data Class(Label)
$A$ $(1,2)$ A
$A_1$ $(2,1)$ A
$A_2$ $(2,2)$ A
$B$ $(6,4)$ B
$B_1$ $(4,4)$ B
$B_2$ $(1,2)$ B

This is just a naive problem of linear classification, but the unique identity of supervised learning methods are represented by the sentence – ‘What we need and what we depend on are the 6 points and their classes(label)’. The label might be the most desirable information to the solution. And we can easily find some solution to this simple problem, like this:


Depending on the information we have already known, for now, these three lines are all solutions. But which one is better and who will be chosen to be the final classifier is not the content for this blog series. If you want to know more about this issue, you can read the book ‘pattern recognition and machine learning'(Bishop C M(2006)4).

On the contrary, reinforcement learning does not have so much label information, which can teach algorithms what to do or what not to do step by step, and this is the essential features of reinforcement learning as well. However, reinforcement learning has a clear goal as well. For instance, a robot is learning to clean the room, and there is no information or a teacher to tell him which direction to go or how long for the first step to take. The only useful information before the mission has been known is to ‘clean up this room’. And what is going to happen is largely depend on some policies that have been in robot’s ‘brain’ initially. In this view, the way which the reinforcement learning learn is closer to our human’s.

Unsupervised Learning

Without the label information does the unsupervised learning be different from both supervised learning and reinforcement learning. The mission of unsupervised learning is to find the hidden structure behind the mess of data. Let’s look at the naive example above again, and the label information has gone:

num data
$D$ $(1,2)$
$D_1$ $(2,1)$
$D_2$ $(2,2)$
$D_3$ $(6,4)$
$D_4$ $(4,4)$
$D_5$ $(1,2)$

then the data in a 2-D plane are distributing like:


Unsupervised learning mission is to design an algorithm or a strategy to find some structure behind the data. Though this mission can be finished by totally different algorithms, a good one would give us more useful information.


This is a simple solution in the [^figure 3], but not the only one or even the better one.
Though the unsupervised learning does not have any label information as the same as reinforcement learning, which, in other words, both reinforcement learning and unsupervised learning do not have a teacher to teach him what to do next. On the other hand, reinforcement learning must have a clear goal, but that is not necessary for unsupervised learning.

Some concepts of Reinforcement Learning

As we have discussed some features of reinforcement learning, by which reinforcement learning is distinct from supervised learning or unsupervised learning, These features make reinforcement learning an isolated part of machine learning. We, now, go into the details of reinforcement learning to learn some basic concepts.


An agent, of course, is not a super spy like Ethan Hunt(‘Mission impossible’ film series) or other guys with super abilities. It can be an animal, a robot, or etc. However, they all have a clear goal, such as the newborn deer just want to stand on its foot, the robot just wants to clean the room and etc. Sure enough, their challenges are not to learn from labeled data or to find some structure behind the data, but by interacting with their environment to reach their clear goals. The environment to the deer perhaps is the gravity of earth, the wind, or even might be the slippery ground. And there definitely are no teachers or something else teaching them what should do and what should not. They have to decide the following actions by themselves and make sure these actions can help them to attain their goals.

Agent must have some abilities. Firstly, Agent should be able to sense their environment, and this is also known as the state of the environment. This is very important to any agents, just like if you had already been standing on your foot, you would not try to do anything else to stand up again. Secondly, the actions of the agent change the environment, like, every action of the robot will make the room different from before. Finally, the agent decides what to do all by itself based on its policies and according to the environment.
This is a brief description of an agent, more details can be found Richard S. Sutton, A. G. B. (2011)5


Everything in the problem is the component of the environment, even the agent is also a part of the environment. A precise definition of environment in the reinforcement learning is not easy and not necessary, and what we should remember is that the agent is and always will be living, sensing and acting in its environment.

Reward and Value

These two concepts are so similar that no beginner can tell them apart clearly. I find a view from which we can identify them easily.
The rewards are the real signal we have got or we will get from the environment. For instance, we are playing a multi-armed bandit
what we get from action is the reward, and either win or lose is decided by the machines, or speaking precisely, it’s decided by the environment. And the reward is a constant produced by the machines and won’t be changed by anything for any reason. Reward signal, or reward for short, is a real signal produced by an agent’s interaction and its environment.
On the contrary, value is calculated by a value function, which had designed before the actions. It looks like an oracle, who told you what might happen after each action. In other words, the value is an estimate of an action before it’s really acting.
The goal of an agent is always converted into maximizing the agent’s total rewards in our reinforcement learning algorithms. The reward (signal) is only depended on the action and the environment, but the value can depend on everything, sometimes, it even can be stochastic. However, value functions do still be a piece of very reliable information to help the agent make a decision.


A policy decides the way that the learning agent behaves at a given time. A vivid description is that a policy is a brain or logical system of an agent. While the agent here is always regarded as an algorithm.

Action and state

Action and state are both elementary concepts that I have mentioned in section Agent and Environment.


There should be some other sections, like a limitation, scope, some suggestion and etc, but I do not think they are useful for a beginner, so I plane to discuss all these sorts of things at the end of the series, like a survey.
This is my first article about reinforcement learning. The concepts are more useful in our future algorithm study, while the distinctions between reinforcement learning and other machine learning algorithms can give us a big map to make us know where we are.


  2. Schultz, W. (2015). Neuronal Reward and Decision Signals: From Theories to Data. Physiological Reviews, 95(3), 853–951. 
  4. Bishop C M. Pattern recognition and machine learning[M]. springer, 2006. 
  5. Richard S. Sutton, A. G. B. (2011). Reinforcement Learning An Introduction second edition. 

Spread the love

Leave a Reply

Write a Comment


Notify of