SDS 510: Deep Reinforcement Learning

Podcast Guest: Jon Krohn

September 30, 2021

Welcome back to the FiveMinuteFriday episode of the SuperDataScience Podcast!

This week I continue my quick hits on different forms of machine learning – with deep reinforcement learning.

 

Continuing our themes from previous episodes, I want to introduce reinforcement learning, which I find especially interesting. It enables machines to defeat humans at strategic board games or allows robotic arms to solve Rubik’s cubes. In this learning, you can have a fixed data set. The algorithm, or agent, takes action in an environment that returns state information and reward back to the agent.
As an example, the agent could be an algorithm playing Atari video games. The algorithm makes choices and takes actions within the games which changes what’s happening in the environment which reacts to those actions (like pressing left on the controls which move the Pong paddle in the environment). It returns state information and reward information which, in this case, would be the movement of pixels and a score. Through time and experimentation, the agent learns the actions that lead to an accumulation of points based on actions. Its goal is to maximize the rewards. Reinforcement learning is this continuous loop of action, information, and maximizing reward.
Deep reinforcement learning is a version of this that incorporates an artificial neural network to figure out and strategize on actions that should be taken in an environment. The powerful thing about this is the neural network is very adept at processing complex sensory input while reinforcement learning is adept at selecting appropriate actions in complex scenarios. The combination allows for algorithms that can process a huge amount of input while making decisions.

ITEMS MENTIONED IN THIS PODCAST:

Podcast Transcript

(00:05):
This is FiveMinuteFriday on Deep Reinforcement Learning. 

(00:19):
On last week’s FiveMinuteFriday, I introduced the two largest categories of machine learning problems, supervised and unsupervised learning. Today, I’m introducing a third category, reinforcement learning, which, while not as common as supervised or unsupervised learning, it is especially interesting, and underpins many of today’s most exciting artificial intelligence breakthroughs. From enabling machines to decisively defeat humans at creative and computationally vast board games and video games, through to enabling robotic hands to be able to perform complex tasks, like solving a Rubik’s Cube.
(01:00):
All right, right off the bat, one of the biggest differences between reinforcement learning and those other machine learning categories is that with supervised or unsupervised learning, you can have a fixed data set. Whereas, with reinforcement learning, the data that the algorithm is getting access to is constantly changing based on particular actions that the algorithm itself takes. More specifically, in reinforcement learning, the algorithm is called an agent, and that agent takes actions in what we call an environment. In turn, the environment returns back to the agent two types of information. 
(01:45):
It returns something called state information, which provides an update about the current state of play. I’ll get into an example with some details in a second, so this makes more sense. The other type of information that the environment returns back to the algorithm, back to the agent, is called a reward. This reward must be some number that we have programmed the algorithm to try to maximize. As an example, the agent could be an algorithm that is playing Atari video games, so classic video games like Pong or Space Invaders, for example. In that case, the agent is kind of like controlling a video game controller, it wouldn’t actually control a real controller. It would be able to programmatically, through software, control the video game, but it would be the same as if you were a person pressing buttons on an Atari joystick.
(02:47):
All right, so the algorithm takes actions. It chooses to, in a software like way, move the joystick up or down, or left or right, and so that changes what’s happening in the environment. The environment reacts to those actions, so if you’re playing Pong, you control the paddle that you’re hitting a ball around with, and when you press left on the joystick, the Pong paddle moves left. If the action that the agent takes is pressing left on the joystick, then the environment will respond back to the agent by providing new state information, showing that the Pong paddle has moved to the left. Remember that the environment returns back two pieces of information to the agent, so not just that state information on how the pixels on the screen have changed, but also that reward information. With Atari video games, for example, reward can be your point score in the video game. The agent’s objective then is to maximize its point score.
(03:54):
So what the agent does is it then repeats this loop. The agent takes one action, like moving left, the environment returns that, hey, the Pong paddle has moved left and you didn’t get any reward for that action. But through time through experimentation, the agent learns some kinds of actions that lead to accumulating points, like getting the Pong ball past the opponent’s paddle. Over time, the agent learns that, okay, if I take that kind of action, if I can take an action that leads to the ball going past my opponent’s Pong paddle, I will get reward. My goal is to maximize my reward, so I’ll take more of those kinds of actions. Reinforcement learning proceeds through this continuous loop where the agent takes an action, and then the environment returns back a new state, and some report on how the agent’s doing on reward. We just continue looping through that over and over again, agent, environment, agent environment, until we reach some particular stopping point, like winning the game, or dying in the game, or something like that. All right, so then what’s deep reinforcement learning?
(05:05):
Now we know what reinforcement learning is, well, deep reinforcement learning is simply a reinforcement learning algorithm that incorporates an artificial neural network in order to figure out what actions it should be taking in the environment. These neural network layers, we can call them a deep learning model, say if there’s lots of neural network layers. But interestingly, we do call it a deep reinforcement learning algorithm, even if there’s just a shallow neural network as our model that we’re using to figure out what actions to take. The really powerful thing about deep reinforcement learning is that the neural network is really adept at processing complex sensory input, like the pixels on a screen, for example, while reinforcement learning is really adept at selecting the appropriate action from a large number of complex possibilities in complex scenarios. Together, combining deep learning, like deep neural networks, with reinforcement learning, we can end up with algorithms that are great at processing a lot of complex sensory input and making great decisions, despite a large number of possibilities.
(06:24):
If you’d like to learn more about specific applications of deep reinforcement learning, you can check out episodes number 438 and 440 of this podcast, which are both short FiveMinuteFridays like this one. For a more in depth episode, you can check out our interview with world leading deep reinforcement learning researcher and robotics entrepreneur, Pieter Abbeel, in episode number 503. If you’d like to actually get into the nitty gritty mathematical and computational underpinnings of deep reinforcement learning algorithms, you can check out chapter 13 of my book, Deep Learning Illustrated. All right, that’s it for today’s episode, keep on rocking it out there, folks, and catch you on another round of SuperDataScience very soon. 
Show All

Share on

Related Podcasts