I am Tony, a Chinese guy. I'm trying to be an Artificial Intelligence researcher. My blogs can help me summarize what I have learned, and may bring some lights to others. I'm very happy to see your comments. Good luck!

A Brief Introduction of Reinforcement Learning

Spread the love

An Introduction of the brief Introduction

Some of you may be confused by the title, for most blogs serise or articles are always begin with a ‘real’ introduction. Believe me, I have tryed to prepair all the information, which I think they should be known before we learn reinforcement learning algorithms, to make up a real good introduction, but I finally give up for there are such tremedous amounts of aspects to talk about. However, a light bright on me, why not just write a brief one only for the most basic concepts, then at the end present a good survey or a overall summary. So in this article, I will just talk about:
1. What is a Reinforcement Learning
2. supervised Learning, Unsupervised Learning and Reinforcement Learning
3. Some Basic Concepts

What Is a Reinforcemet Learning

This is always our first question for this subject, we can found more detials from wikipedia ‘Reinforcement Learning’1. However, here I want to present a more readable interpretation: reinforcement learning is a kind of machine learning, whose purpose is to solve problems by approaching the learning process of human beings or other intelligent creatures through computer programs. This long sentence contains three important views:
1. The purpose is to solve problems which we come cross and have not ever been solved by already known methods
2. Most of the reinforcment learning ideas come from psychology and neuroscience
3. We simulate all the conditions and exert our algorithms on a(or more) modern computer(s)

All these might be the best I can introduce to you by speaking English, yet some guys might still be confusing, for all the descriptions above are actually just like a normal machine learning, such as a linear regression or even not as powerful as the neural networks. So, let’s show the distictions between reinforcement learning and the other machine learning algorithms.

By the way, if you have question about why it is called ‘reinforcement learning’, You can find out the great word ‘reinforement’ from the psychology research (Schultz W (July 2015)2) or just read the WIKIPEDIA ‘Reinforcement’3

supervised Learning, Unsupervised Learning and Reinforcement Learning

supervised Learning

supervised Learning is almost the most popular set of methods that we have ever heard about machine learning or artificial intellegence. Machine learning is a bigger concept, and it contains supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning and etc.
Linear classification algorithms are a bunch of the most accessible methods of machine learning to begin with. However, what I mentioned here are all about the easiest ones, say, we have already sampled 6 2-D points from a dataset , and each of them belongs to one of two different classes that we called this class information ‘label’. And then our mission is to find a line or a hyperplane to tell the whole dataset apart to two classes. Though these points may be mixed together, we still have strategy to accomplish our mission. What we need and what we depend on are the 6 already known points and their classes(label), for instance:

Point Name Data Class(Label)
$A$ $(1,2)$ A
$A_1$ $(2,1)$ A
$A_2$ $(2,2)$ A
$B$ $(6,4)$ B
$B_1$ $(4,4)$ B
$B_2$ $(1,2)$ B

This is just a naive problem of linear classification, but the unique identity of supervised learning methods are represented by the sentence – ‘What we need and what we depend on are the 6 points and their classes(label)’. The label might be the most desirable information to the solution. And we can easily find some solution to this simple problem, like this:


Depending on the information we have alread known for now, these three lines are all solutions. But which one is better and who will be chosen to be the final classifier is not the content for this blog serise. If you want to know more about this issue, you can read the book ‘pattern recognition and machine learning'(Bishop C M(2006)4).

On the contrary, reinforcement learning does not have so much label information, which can teach the algorithms what to do or what not to do step by step, and this is the essential features of reinforcement learning as well. However, reinforcement learning has a clear goal as well. For instance, a robot is learning to clean the room, and there is no information or a teacher to tell him which direction to go or how long for the fist step to take. The only useful information before the mission has been known is to ‘clean up this room’. And what is going to happen is largely depand on some plocies that has been in robot’s ‘brain’ initially. In this view, the way which the reinforcement learning learn is closer to our human’s.

Unsupervised Learning

Without the label information does the unsupervised learning be different from both supervised learning and reinforcement learning. The mission of unsurperwised learning is to find the hidden struction behind the mess of data. Let’s look at the naive example above again, and the label information has gone:

num data
$D$ $(1,2)$
$D_1$ $(2,1)$
$D_2$ $(2,2)$
$D_3$ $(6,4)$
$D_4$ $(4,4)$
$D_5$ $(1,2)$

then the data in a 2-D plane are distributing like:


Unsupervised learning mission is to design a algorithm or a strategy to find some structure behind the data. Though this mission can be finished by totally different algorithms, a good one would give us more useful information.


This is a simple solution in the [^figure 3], but not the only one or even the better one.
Though the unsupervised learning does not have any label information as the same as reinforcement learning, which, in other words, both reinforcement learning and unsupervised learning do not have a teacher to teach him what to do next. One the other hand, the reinforcement learning must have a clear goal, but that is not necessary for unsupervised learning.

Some concepts of Reinforcement Learning

As we have discussed some features of reinforcement learning, by which reinforcement learning is distinct from supervised learning or unsupervised learning, These features make reinforcement learning an isolated parts of machine learning. We, now, go into the detials of reinforcement learning to learn some basic concepts.


An agent, of course, is not a super spy like Ethan Hunt(‘Mission impossible’ film series) or other guys with super ablities. It can be an animal, a robot, or etc. However, they all have a clear goal, such as the new born deer just want to stand on its foot, the robot just want to clean the room and etc. Sure enough, their challenges are not to learn from labeled data or to find some structure behind the data, but by interacting with their enviroment to reach their clear goals. The environment to the deer perhaps are the gravity of earth, the wind, or even might be the slippery ground. And there definitly are no teachers or something else teaching them what should do and what should not. They have to decide the following actions by themself and make sure these actions can help them to attain their goals.

Agent must have some ablities. Firstly, Agent should be able to sense their environment, and this is also known as the state of environment. This is very important to any agents, just like if you had already been standing on your foot, you would not try to do anything else to stand up again. Secondly, the actions of the agent change the environment, like, every action of the robot will make the room different from before. Finally, the agent decide what to do all by itself basing on its policies and according to the environment.
This is a brief description of agent, more detials can be found Richard S. Sutton, A. G. B. (2011)5


Everything in the problem is the component of the environment, even the agent is also a part of environment. A precise definition of environment in the reinforcement learning is not easy and not necessary, and what we should remember is that the agent is and always will be living, sensing and acting in its environment.

Reward and Value

These two concepts are so similar that no beginer can tell them apart clearly. I find a view from which we can identify them easyly.
The rewards is the real signal we have got or we will get from environment. For instance, we are playing a multi-armed bandit
what we get from an action is the reward, and either win or lose is decided by the machines , or speaking precisely, it’s decided by the environment. And the reward is a constant produced by the machines and won’t be changed by anything for any reason. Reward signal, or reward for short, is a real signal produced by agent’s interaction and its environment.
On the contrary, value is calculated by a value function, which had designed before the actions. It looks like an oracle, who told you what might happen after each action. In other words, the value is an estimate of an action before it’s really acting.
The goal of an agent is always converted into maximizing the agent’s total rewards in our reinforcement learning algorithms. The reward (signal) is only depended on the action and the environment, but the value can depend on everything, sometimes, it even can be stochastic. However, value functions does still be a very reliable information to help agent make decision.


A policy decide the way that the learning agent behave at given time. A vivid describtion is that a policy is the brain or logical system of an agent. While, the agent here is always regarded as an algorithm.

Action and state

Action and state are both elementary concepts that I have mentioned in section Agent and Environment.


There should be some other sections, like limitation, scope, some suggestion and etc, but I do not think they are useful for a beginner, so I plane to discuss all these sorts of things at the end of the serise, as a survey.
This is my first article about reinforcement learning. The concepts are more useful in our future algorithm study, while the distinctions between reinforcement learning and other machine learning algorithms can give us a big map to make us know where we are.


  1. ↩︎
  2. Schultz, W. (2015). Neuronal Reward and Decision Signals: From Theories to Data. Physiological Reviews, 95(3), 853–951. ↩︎
  3. ↩︎
  4. Bishop C M. Pattern recognition and machine learning[M]. springer, 2006. ↩︎
  5. Richard S. Sutton, A. G. B. (2011). Reinforcement Learning An Introduction second edition. ↩︎

Spread the love

Tony Tan

I am Tony, a Chinese guy. I'm trying to be an Artificial Intelligence researcher. My blogs can help me summarize what I have learned, and may bring some lights to others. I'm very happy to see your comments. Good luck!

You may also like...

Leave a Reply

Notify of