Beyond deep learning: Part 1


The successes of deep learning

Deep learning systems have achieved in the last decade smashing successes in all kinds of domains. Self-driving cars, skin cancer diagnostics, movie and song recommendations, language translation, automatic video surveillance, digital assistants represent just a few examples of the ongoing revolution that affects or is going to change soon our everyday life [1].

>> Have you noticed how dramatically has Google-translate improved over the last years? It is amazing! Just a few years ago the translation from Italian to English was hilarious but now it makes a lot of sense

>> That movie recommended by Netflix last night was wonderful…

>> Fantastic, no more waking up to a deluge of spams!

These are just a few enthusiastic comments, showing how people are experiencing the increased performance of machine learning systems powered by deep neural networks.

This boost in the performance of deep learning algorithm started in the early 2000 and was due to various reasons:

  • The unprecedented amount of training data produced by social media sites.
  • The advent of GPU computing that allow training complex neural networks in a short time.
  • The availability of new optimization algorithms (stochastic gradient descent).
  • The invention of  techniques to improve generalization, such as the dropout.
  • The introduction of new random weight initialization methods. These methods prevent the “exploding gradient” and “vanishing gradient” problems[2].

Four challenges for deep learning systems

But all that glitters is not gold.. AI researchers and practitioners have pointed out four main drawbacks of deep learning systems [3].

  • Algorithms can get easily confused. For example, an image recognition algorithm trained on thousands of images can easily be deceived by little stickers digitally pasted in one corner of an image, or objects outside their normal context like an elephant sitting on a sofa. This is a type of vulnerability to so-called adversarial attacks
  • While humans can learn new concepts from just a few examples, deep neural networks need thousands of examples in order to reach satisfactory performance. This is not only true for supervised learning tasks such as image or speech recognition, but it also holds true in case of reinforcement learning (RL)
    For instance, both the Atari and AlphaGo agents, (which have beaten top human players in classic Atari 2600 arcade games and the complex board game Go, respectively), had to play thousands of rounds before mastering the game. Most human players can accomplish such a task  in a few hours.
  • After training,  it is not always clear how a deep learning system makes decisions. This opacity issue leads to lack of interpretability. A bank denying loans to clients, should explain the reasons of such a decision – as required by law in some countries. When an AI system suggests a treatment for a patient to a doctor, the doctor would definitely want to know the reasons for that particular treatment. In these cases, very accurate but non-transparent algorithms could be abandoned in favor of more interpretable ones, despite their lower accuracy
  • Progress in unsupervised learning still is unsatisfactory. Unsupervised models have no pre-set examples to follow. They have to figure out the labels by themselves before proceeding with more traditional (supervised) classification methods.

Doubts and future research trends.

Despite the enthusiasm for deep learning and artificial intelligence, there is an ever growing feeling of skepticism about it. Some observers think that a technical limit has been reached and it’s not  possible to apply deep learning to further practical uses.  AI researchers are exploring a number of new ideas in order to keep deep learning viable as a research field and dissipate skepticism. This would, in turn, make algorithms more intelligent and suitable to complex use cases (e.g., autonomous vehicles, surveillance, finance, military, healthcare, etc.).
There are three essential directions that AI researchers are considering for the near future.


Meta-learning or learning to learn, is an ability that allows humans to master things quickly. Then they can apply their knowledge to handle similar new tasks. Learning theory says we learn on two timescales, a short term scale and a long term scale. In the short term we focus on learning about specific examples. Over longer timescales we learn the abstract rules required to complete a task. This combination allows us to learn efficiently. In [4] the authors implemented a form of meta-learning in a reinforcement learning agent using a recurrent neural network (RNN).

In a reinforcement learning setting, an agent learns to act in an environment (e.g. a video game) by trial and error guided by the reward it receives after each action. Opposite to what researchers thought, after freezing the weights of the neural network, (i.e. not adjusting them during learning), the RNN agent could still solve new tasks. The RNN used the reward signal to memorize past actions and extract general rules, allowing them to solve novel tasks without further training!

Splitting the task across many networks working together

In [5] Google recently proposed an architecture with this concept in mind, called Generative Query Network (GQN). In the GQN framework, moving machines learn to perceive their surroundings. The AI model gets training as the machine moves around the scene, using only data obtained by machine itself. This approach requires no human labeling of the contents, nor rewards and punishments (as in reinforcement learning).
The GQN model consists of two parts: a representation network and a generation network.
The representation network takes the agent’s observations as its input and produces a representation (usually a vector) describing the scene. The generation network makes predictions about objects and features not currently visible to the AI. For example, if only three legs of a table are visible in the current frame, the model will include a fourth leg. That leg shall have a likely size, shape, and color. Such predictions, in turn, help the system learning concepts faster with respect to standard deep-learning methods.

Built-in inductive bias

Built-in inductive bias is an idea that provides the artificial neural networks with a kind of built-in inductive bias instead of letting them learn everything from scratch for every new problem. In this realm, a new approach known as graph network [6] is attracting much attention in the research community, and will be discussed in detail in the second part of this post.

More to come..

In the second part, we will focus our attention on the graph network approach, which encompasses deep learning systems that have an innate bias towards representing things as objects and relations.

Stay tuned!


[1] M. M. Najafabadi, et al., “Deep learning applications and challenges in big data analytics”, Journal of Big Data 2015 2:1

[2] Yann LeCun, et al., “Deep learning”, Nature volume 521, pages 436-444 (28 May 2015)

[3] M. Mitchell Waldrop, “News Feature: What are the limits of deep learning?”, Proceedings of the National Academy of Sciences Jan 2019, 116 (4) 1074-1077; DOI: 10.1073/pnas.1821594116

[4] Wang JX, et al., “Prefrontal cortex as a meta-reinforcement learning system”, Nat Neurosci 21:860–868 (2018).

[5] Eslami SMA, et al., “Neural scene representation and rendering”, Science 360:1204-1210 (2018)

[6] Battaglia PW, et al., “Relational inductive biases, deep learning, and graph networks”, ArXiv:1806.01261 [cs.LG] (2018)

Subscribe to our Newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *