Beyond deep learning: Part 1


The successes that deep learning systems have achieved in the last decade in all kinds of domains are unquestionable. Self-driving cars, skin cancer diagnostics, movie and song recommendations, language translation, automatic video surveillance, digital assistants represent just a few examples of the ongoing revolution that affects or is going to disrupt soon our everyday life [1].

>> Have you noticed how dramatically has Google-translate improved over the last years? It is amazing! Just a few years ago the translation from Italian to English was hilarious but now it makes a lot of sense

>> That movie recommended by Netflix last night was wonderful…

>> Fantastic, no more waking up to a deluge of spams!

These are just a few of the most enthusiastic comments, demonstrating how people are making a direct experience of the increased performance of machine learning systems powered by deep neural networks.

This boost in the performance of deep learning algorithm started in the early 2000 and was due to various reasons:

  • the unprecedented amount of training data produced by social media sites, the advent of more powerful computers that allowed to train in a shorter amount of time neural networks with many layers by means of both CPU and GPU
  • the availability of new optimization algorithms such as stochastic gradient descent and its variants
  • the invention of  techniques to improve generalization such as the dropout
  • the advent of new random weight initialization methods to prevent the exploding or vanishing gradient problem [2]

But all that glitters is not gold…

AI researchers and practitioners have pointed out 4 main drawbacks of deep learning systems [3]. Let’s summarize them below

  • Algorithms trained on thousands of images until having the classification down cold, can get easily confused by little stickers digitally pasted in one corner of an image, or objects outside their normal context like an elephant sitting on a sofa. This is a type of vulnerability to so-called adversarial attacks
  • While humans generally learn new concepts from just one or two examples, deep neural networks need thousands of examples in order to reach satisfactory performance. This is not only true for supervised learning tasks such as image or speech recognition, but it also holds in case of reinforcement learning (RL)

    For instance, both the Atari and Go RL agents, which have beaten the world-champion human players in classic Atari 2600 arcade games and the complex board game Go, respectively, had to play thousands of rounds before mastering the game. Such a task can be accomplished by most human players in a few hours if not minutes
  • Once a deep learning system has been trained, it is not always clear how it makes decisions. This is the opacity issue which leads to lack of interpretability. A bank denying loans to clients, should explain the reasons of such a decision – as required by law in some countries. When an AI system suggests a treatment for a patient to a doctor, the doctor would definitely want to know the reasons for that particular treatment. In these cases, even if deep learning algorithms are the most accurate ones, they might be overlooked in favor of more interpretable ones, despite their lower accuracy
  • Little progress has been done in unsupervised learning, for which one does not happen to have the thousands carefully labeled training examples at all. Unsupervised models should figure out the labels themselves before proceeding with more traditional (supervised) classification methods.


What is missing then?

Regardless the enthusiasm for deep learning and artificial intelligence, there is an ever growing feeling of skepticism towards more practical solutions and use cases. What is current deep learning missing here?
AI researchers are exploring a number of new ideas in order to revive deep learning as a research field and kill the aforementioned skepticism. This would, in turn, make algorithms truly intelligent, especially in the complex use cases of every day life (e.g., autonomous vehicles, surveillance, finance, military, healthcare, etc.).
There are three essential directions that AI researchers are considering for the near future.

Meta-learning or learning to learn, an ability that allows humans to master things quickly and apply that knowledge to handle similar new tasks. More precisely, the meta-learning theory shows that we learn on two timescales: in the short term we focus on learning about specific examples while over longer timescales we learn the abstract rules required to complete a task. It is this combination that is thought to help us learn efficiently. In [4] the authors implemented a form of meta-learning in a reinforcement learning agent by means of a recurrent neural network (RNN).

In a reinforcement learning setting, an agent learns to act in an environment (e.g., a video game) by trial and error guided by the reward it receives after each action. The researchers have discovered that the agent represented by their RNN was still able to solve new tasks even when the  weights of the neural network were frozen, i.e. not adjusted during the learning process. This shows that the reward signal was used by the RNN to memorize past actions and extract general rules, allowing them to successfully complete novel tasks without needing further training!

Training multiple networks to work in tandem instead of just one big network. An architecture recently proposed by Google in [5] has been designed with this concept in mind and goes under the name of Generative Query Network (GQN). The GQN framework allows machines to learn to perceive their surroundings by training only on data obtained by themselves as they move around the scene. This approach requires no human labeling of the contents nor it needs to receive specific forms of reward and punishments as in a reinforcement learning setting.
The GQN model is composed of two parts: a representation network and a generation network.
The representation network takes the agent’s observations as its input and produces a representation (usually in the form of a vector) which describes the underlying scene. The generation network performs predictions about the objects and the features that are not currently visible to the AI. For example, if a table only has three legs visible from the current frame, the model will include the fourth leg with compatible size, shape, and color. Such predictions, in turn, help the system learning concepts faster with respect to standard deep-learning methods.

Built-in inductive bias is an idea that provides the artificial neural networks with a kind of built-in inductive bias instead of letting them learn everything from scratch for every new problem. In this realm, a new approach known as graph network [6] is attracting much attention in the research community, and will be discussed in detail in the second part of this post.

In the second part, we will focus our attention on the graph network approach, which encompasses deep learning systems that have an innate bias towards representing things as objects and relations.

Stay tuned!


[1] M. M. Najafabadi, et al., “Deep learning applications and challenges in big data analytics”, Journal of Big Data 2015 2:1

[2] Yann LeCun, et al., “Deep learning”, Nature volume 521, pages 436-444 (28 May 2015)

[3] M. Mitchell Waldrop, “News Feature: What are the limits of deep learning?”, Proceedings of the National Academy of Sciences Jan 2019, 116 (4) 1074-1077; DOI: 10.1073/pnas.1821594116

[4] Wang JX, et al., “Prefrontal cortex as a meta-reinforcement learning system”, Nat Neurosci 21:860–868 (2018).

[5] Eslami SMA, et al., “Neural scene representation and rendering”, Science 360:1204-1210 (2018)

[6] Battaglia PW, et al., “Relational inductive biases, deep learning, and graph networks”, ArXiv:1806.01261 [cs.LG] (2018)

Subscribe to our Newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *