In July, Dataiku sponsored and attended VBTransform, a three-day virtual conference hosted by VentureBeat that focuses on best practices and tangible ways that business leaders can gain and maintain a competitive edge in an increasingly AI-driven world.
In a fireside chat, Dataiku’s VP of Field Engineering Jed Dougherty spoke to Ion Stoica, Co-founder of Rise Labs to break down how supervised learning differs from reinforcement learning and how the latter can be used to improve accuracy and predictability of models. We’ve highlighted the main points below.
Reinforcement Learning vs. Supervised Learning
To begin, the conversation highlighted how reinforcement learning is a paradigm shift from the classic supervised learning process we know where we collect a ton of data, train the data, deploy a model, monitor performance, and so on.
To Ion, the widely used supervised learning technique fundamentally tries to associate an input and output and learn the mapping between the two, while reinforcement learning is about interacting with the world (a specific environment) and trying to maximize some reward. There’s an agent that observes the state of the environment and makes decisions and changes based on the state of the environment. The goal of the agent is to maximize the aforementioned reward and, in the process, learn policies to guide its actions in order to, ultimately, make a particular decision.
There are many examples of reinforcement learning such as game-playing AI like Google’s AlphaGo, where an action is to move a piece on the table (where the environment is a layout of the table with all of the pieces) with the goal of winning the game. Unlike supervised learning, reinforcement learning has no labels — you take certain actions and see what the outcome is. If you win the game, you reinforce the moves you made in the game. If you lose, you negatively reinforce the moves of that game, meaning next time you play, you are less likely to make those moves and rather repeat the ones that led to a victory.
Data Collection and Prep With Reinforcement Learning
The conversation then shifted to exploring if you need to do data collection and prep with reinforcement learning or not and, if not, how do you begin tackling a reinforcement learning problem? According to Ion, it greatly depends on the problems you are trying to solve. In the case of playing a game, this is true, as the data collection happens during the playing of the game itself.
The goal of the reinforcement learning agent is to build this policy, making observations of the environment to make a decision and maximize the reward. We use data to train this policy a priori and refine it while playing the game or solving a problem, but it’s not necessary. In the case of AlphaGo, the team used the database of existing games to learn a policy and refine it for AlphaZero, which became a better program through pure iterative learning. It’s not that you don’t need data in reinforcement learning, but to acquire it you need to interact with the environment and observe the reward. We interact with the environment, react, and then take actions to achieve a goal.
Does Reinforcement Learning Work With Incomplete Information?
Although many of the games where reinforcement learning can be used (AlphaGo, chess, and so on) have complete information, it is possible for it to work for games with incomplete information. Actually, more often than not, this will be the case. If you are using reinforcement learning to guide a robot to get from place A to place B, for example, the robot will only have observations about the environment through radar, images, sound, and so on and navigate based on this information because it’s all it has to capture the state of the environment. The same can be said for self-driving cars, they are learning as they go.
Reinforcement Learning Terminology
Jed observed that a lot of the terminology used in reinforcement learning (agent, simulation, etc.) evokes a lot of mathematical and psychology concepts. To shed some light on the history of reinforcement learning, Ion explained that reinforcement learning first came together in the 1980s and that there are a few threads that tie the technique back to these concepts:
- Learning by trial and error, which is seen studying animal behavior and psychology. One experiment explains how a cat learned how to open the box it was in and get out by moving the latches around, etc. Then, it got a reward (such as a meal) when it got out, demonstrating how animals can be successful with this approach to learn new behaviors because the next time the cat opens the box, it will likely do so faster.
- Operational research and engineering used for optimizing processes. For example, landing a spacecraft on the moon required controlling the mass, velocity, and height of the craft to land without crashing.
- Dynamic programming from computer science to compute a solution by subdividing the problem into several smaller problems.
Why Has Supervised Learning Been the Predominant Use Case vs. Reinforcement Learning?
When asked about AI, a person on the street might describe something much closer to reinforcement learning than supervised learning, as reinforcement learning iteratively learns about the environment and the best way to perform actions over time. There are two main reasons supervised learning has been the predominant use case in the industry versus reinforcement learning.
First, with supervised learning, you can solve a lot of very practical problems. Secondly, it’s easier, especially when you have data. With reinforcement learning, it’s about a sequence of decisions to achieve a desired result, which takes a lot of interaction and iteration with the environment to get to a good policy. This process takes time and is more expensive. According to Ion, you can run a game over and over, but you can’t run the world over and over. Continuous refinement is a key part of reinforcement learning.