Blog‎ > ‎Machine Learning‎ > ‎

Introduction to Reinforcement Learning : In 2(or a bit more) Minutes

(image credits to Big Data Made Simple's article)

Hey guys! its been quite some time since i wrote out a post myself, and just to get back in the groove, I am going to give you a primer about a hot topic in Artificial Intelligence, Reinforcement Learning.

Machine Learning, as we know can be classified into 3 broad categories:

Supervised Learning (involves <x, y>):
A supervised learning problem involves a set of attributes x, that define an example from the problem set, and the value of a feature y, that shows dependence on x. the challenge posed to the algorithm is to find/approximate a function f that maps the set x to y. This function should, basically, predict a value y'  based upon the input x, such that y' is equal or most similar to the true value y corresponding to the example whose x was used as input.
For example, predicting the breed of a flower based on its petal and sepal dimensions(classification), or predicting the price of a house based on metrics such as carpet area and elevation (regression) etc.

Unsupervised Learning
(involves <x>):
In unsupervised learning, the problem posed is that we have a set of data points <x>, with no particular label (no y) to them given to us explicitly. The algorithm has to find structure or patterns in the data points by itself. This can include clustering, detection of outliers or dimensionality reduction to aid visualization. There is no right answer here, the only goal is better insight into the data available, and is based on the prior assumption that similar data points will tend to lie closer to one another in the n-dimensional hyperspace(n being the number of attributes per example of <x>).

Reinforcement Learning (involves <x,z>, and finding y's) :
Here, we want to find a function mapping from x's to y's, given an objective on z's(e.g. to maximize/minimize total z value, say)
The x's are referred to as states or observations, the z's as rewards, and the y's are actions, or decisions made by the algorithm (called the agent, here). The actions made by the agent are interactions with the surrounding environment, the space under observation.


As seen in the diagram, the agent is faced by states and rewards on the basis of actions taken on prior states incurred. That is, it is a sequence of decision making steps. Moreover, the subsequent states and the rewards are directly dependent on the action taken on the current state.

Applying the RL terminology to a game of chess, we can relate the states to configurations of the board at any given time, actions to the set of moves available to a player, and the reward to the outcome of win or loss. The reward here, the outcome of the game, is delayed right until the end of the game. So if an RL agent were to be good at chess, it should be able to 'foresee' (not really), or rather anticipate, delayed rewards, and discriminate between good and bad moves NOW, given any configuration of the board. This is called the Credit Assignment Problem in reinforcement learning terms.
In simpler words, the doubt that

'If I lose(/win) the game in the end, which particular move led to my downfall(/victory)? Its not necessary that it was the penultimate move is it?'

Copyright:PA/Press Association Images
..is the basic idea of the CAP.

Since reinforcement learning shows this sort of decoupling, allowing the agent/algorithm to figure out the best y values for itself, it seems to be the one that seems to leverage 'thinking' the most in its inherent demand for sequential decision-making and delayed rewards, and hence is an exciting prospect. It seems so relatable to our daily hustle, doesn't it?

in some cases, due to the freedom of exploration and no hard 'labels' to data or 'definite actions' to situations, RL agents end up figuring out strategies that may not even have occurred to human experts! Such is the beauty of RL!

Salient examples of RL showing promising results, sometimes even challenging human performance in that field, include:

Acrobatic Helicopter Maneuvers


Robotic Arm Manipulation


DOTA 2




Although these astonishing results may bring doubts into the mind of people about how close we are to the AI singularity or a 'Skynet', when Machine takes over Man, there isn't any cause for concern. We are still a pretty long way from creating Artificial General Intelligence, and these problem-specific RL algorithms, that are devoid of the ability to generalize to more diverse scenarios (beyond a point) or to take intuitive decisions, are but callow in front of the towering capabilities of human-level intellect.
 in the next, soon-to-come article, we will talk about basic modeling of an RL problem, that makes analysis of this domain more tractable. thats it for today then, guys!
Here's some links to get you started :

-Apoorva Gokhale
Comments