Deep reinforcement learning sucks

17 February 2018 (Paris, France) – To the uninitiated, artificial intelligence is an umbrella term that primarily refers to the ability of a computer to think and learn on its own. Deep learning is essentially a specific approach to artificial intelligence that’s inspired by the functionalities and patterns of the human brain. When deep learning is applied to computers, it’s essentially training them to learn as humans do via multiple layers and connections between neurons. Deep learning is responsible for a computer’s ability to recognize images, text, sound, and objects (like how your iPhone unlocks based on your face).

Although I had read quite a bit about AI, it was not until I had the chance to attend a series of courses at ETH Zurich that I began to “unbundle” all of its elements. Across the last four years … which has included 22 e-books and 120 hours of course content … it changed the way I looked at data. I was not seeking a career in artificial intelligence. I simply needed to understand this world-altering technology.

It included learning about reinforcement learning (RL). At its simplest, RL is a technique in building an artificial intelligent network where an agent is allowed to play or run by itself, correcting its movements and outputs every time it makes a mistake. The computation power and training time required solely depends on the type of problem we are trying to solve by building a model.

It hit the public sphere in 2016 when Google’s DeepMind proved that a machine can, indeed, be taught to think strategically and win against the world’s best Go players. This win was a landmark in artificial intelligence and machine learning because it demonstrated with practical evidence the concepts behind reinforcement learning.

So I was intrigued this week when a Google engineer published a long, quite detailed blog post explaining the current frustrations in deep reinforcement learning, and why it doesn’t live up to the hype. Yes, it would normally hit your TL;DR pile but I love these things and save them for my weekend when I carve out a few hours to digest these types of reads.

As I said, reinforcement learning makes good headlines. Teaching agents to play games like Go well enough to beat human experts like Ke Jie fuels the man versus machine narrative. But a closer look at deep reinforcement learning, a method of machine learning used to train computers to complete a specific task, shows the practice is riddled with problems. All impressive RL results that achieve human or superhuman level require a massive amount of training and experience to get the machine to do something simple. For example, it took DeepMind’s AlphaZero program to master chess and Go over 68 million games of self play – no human could ever play this many games in a lifetime.

But the Google researcher, Alex Irpan, who uses deep reinforcement learning for robotics, calls this “sample inefficiency”:

“There’s an obvious counterpoint here: What if we just ignore sample efficiency? There are several settings where it’s easy to generate experience. Games are a big example. But, for any setting where this isn’t true, RL faces an uphill battle, and unfortunately, most real-world settings fall under this category”.

It’s difficult to try and coax an agent into learning a specific behavior, and in many cases hard coded rules are just better. Sometimes when it’s just trying to maximize its reward, the model learns to game the system by finding tricks to get around a problem rather than solve it.

The post lists a few anecdotes where this popped up in research. Here is a good one:

“A researcher gives a talk about using RL to train a simulated robot hand to pick up a hammer and hammer in a nail. Initially, the reward was defined by how far the nail was pushed into the hole. Instead of picking up the hammer, the robot used its own limbs to punch the nail in. So, they added a reward term to encourage picking up the hammer, and retrained the policy. They got the policy to pick up the hammer … but then it threw the hammer at the nail instead of actually using it.”

The random nature of RL makes it difficult to reproduce results, another major problem for research.

Irpan is, however, still optimistic about RL and thinks it can improve in the future:

“Deep RL is a bit messy right now, but I still believe in where it could be. That being said, the next time someone asks me whether reinforcement learning can solve their problem, I’m still going to tell them that no, it can’t. But I’ll also tell them to ask me again in a few years. By then, maybe it can.”

Leave a Reply

Your email address will not be published. Required fields are marked *

scroll to top