3 benefits of deep feature extraction and transfer learning


There’s no real news stating that Feature extraction [1] represents a fundamental step in any machine learning pipeline. Whenever a data scientist plays with a dataset made of a high number of variables, there are pretty good chances that some of them are redundant or noisy. That means such variables do not really carry any signal that might come useful as a predictor. Noise, as always, will affect the overall accuracy of any predictive model.

In this post we explain the top-3 benefits of deep feature extraction.

Feature extraction

In such scenarios, feature extraction allows data scientists to construct a  new representation of the original data. This facilitates the detection of potentially interesting patterns. Yes, principal component analysis (PCA) is one possible solution to the aforementioned problem of noisy variables. In fact, applying a PCA to the original data is the equivalent of creating an embedding with specific characteristics. In this new dimensional space (usually smaller than the original one) a few features called principal components will capture most of the information that is present in the original data. That’s why PCA is mainly used as a dimensionality reduction technique.

PCA allows to find an embedding of lower dimension that captures most of the variance in the original data.
PCA allows to find an embedding of lower dimension that captures most of the variance in the original data.

Until a decade ago, the performance of a machine learning model was heavily relying on the ability of experts to craft hand-engineered features. Such a process required a thorough understanding of the domain the data originated from, and a substantial amount of time.

Deep learning as a game changer

Everything started to change with the recent advances in deep learning, and concerned mainly computer vision and natural language processing tasks. As already explained in a previous post, since the early 2k, deep learning algorithms started to achieve remarkable results in several domains, from image recognition to language translation.
These successes were mainly due to three essential phenomena that occurred almost at the same time:

Researchers were suddenly able to train deep neural networks feeding them millions of images or text documents and achieve state-of-the-art results. Since then, an impressive amount of pre-trained models have been produced and made available to the community.

Deep learning feature extractors

One interesting feature of pre-trained models is their capability to function as feature extractors. A deep neural network trained to recognize people from a large set of images, will show a number of features in its layers. From the first layers such features become more and more complex and abstract. In computer vision such a complexity goes from pixels, blobs, eyes, noses, faces, until clothes and entire scenes. Of course specific neurons will activate for each of these abstract concepts.

What’s amazing is that another classifier – say of animals, flowers, or vehicles – will utilize the same embedding especially in the first layers. After all, blobs of pixels, segments and other low level features are usually conserved across domains.

Internal representation of how the GoogleNet deep learning system builds its understanding of images, from edges and textures in the first layers to patterns, parts and objects in deeper layers https://distill.pub/2017/feature-visualization

The benefits of transfer learning

Due to this property of conserving features across domains, researchers have noticed that switching to different domains did not require model retraining. Transfer learning [2] is in fact the capability of neural networks to generalize across domains. As mentioned before, a consistent number of layers – especially the ones close to the input – of a people-classifier can be perfectly fine for a cats-and-dogs-classifier.

It is not surprising to reach high accuracy with e.g. a linear model on top of a pre-trained network used as feature extractor. As it has been shown  many times already, simpler models can beat fancy ones.

Here are some of the benefits of considering pre-trained models for your next project:

  • fast training. Training a simple model can take minutes instead of hours or days. This allows to quickly test hypotheses and increase productivity
  • good performance. Thanks to transfer learning, one can build accurate models on relatively small datasets
  • adaptability to existing pipelines. A company who have spent time and resources building the infrastructure around a specific modelling framework, can improve model performance without changing their pipelines entirely. They would only need a preprocessing step performed with the pre-trained model.

As researchers require higher and higher standards of classification, the more data their models need to be successfully trained and employed in production environments. However, it should be clear that not having a large amount of data does not prevent an organisation from building powerful AI tools.

In fact, even young startups can rely on transfer learning and adapt existing models trained on the massive datasets of big corporations.
Let innovation begin!

In this post we explained the top-3 benefits of deep feature extraction. Of course, there are many more. Come join our Discord channel and discuss with us.


[1] S. Ding, et al., “A survey on feature extraction for pattern recognition”, Artificial Intelligence Review, vol. 37 (3), pages 169 – 180, 2012.

[2] K. Weiss, et al., “A Survey on Transfer Learning”, Journal of Big Data, vol. 3, pages 1-40, 2016.

[3] https://www.basilica.ai/blog/the-unreasonable-effectiveness-of-deep-feature-extraction

Subscribe to our Newsletter

1 comment

Leave a Reply

Your email address will not be published. Required fields are marked *