Neural networks that don’t need to be trained



One of the problems hindering the adoption of machine learning algorithms on a large scale is the need to “train” them on large datasets. This article briefly explores “weight agnostic neural networks” (WANN) which don’t need training and can be just as efficient as standard neural networks.

Since childhood,  humans develop new skills by interacting with the environment they live in.  Speaking new words, tasting new foods, making new experiences are all occasions for learning. When this happens, new brain connections – called synapses – form. Synapses keep changing, as new experiences flow into the life of a person or organism. What synapses do is storing all the new insights gained during the interaction with the environment.

Standard neural networks

The way the human brain learns new abilities inspired Frank Rosenblatt’s  invention of the artificial neuron, also known as the perceptron,  in 1958. (details in [1]).
As shown in the figure below, Rosenblatt’s perceptron takes a weighted sum of inputs \inline x_i  and applies a transformation \inline f called activation function. \inline f outputs 1 if the weighted sum is greater than a certain threshold \inline \theta or 0 otherwise. The weights \inline w_i  mimic the behaviour of the synapses in a real brain, just without involving any chemistry. At first, they are set to random values.  Later they are gradually modified until the output produced by the network matches the desired output (the target). The name of this fundamental process is training.  With proper training a network can learn a task that it was incapable to do with the initial random values.

The perceptron
The perceptron

Learning suitable weights is still very important in  modern deep learning algorithms. These algorithms use many layers of artificial neurons connected to each other. As I explained in a previous post  a neural network for a deep learning algorithm cannot work well until you train it. Training must be done on big datasets using gradient descent techniques

What if hours and hours of training neural networks on powerful GPUs were not necessary?  Could one just use some random values and have a model that is capable of providing meaningful predictions?
As crazy as it sounds, some researchers have shown this is definitely possible. After all, it’s time for God to play dice…

Weight agnostic neural networks (WANN)

In their paper  “Weight agnostic neural networks”  (see [2]), Adam Gaier and David Ha showed that some neural network architectures can reach optimal accuracy without learning any weights.
The core idea is a radical change of perspective;  training a neural network no longer means finding the optimal weights via gradient descent, but only finding the  optimal architecture for the network.  The performance of an ideal network  is not very sensitive to the values of its weights. Training the network, in this case, means altering only the network topology.

Searching for the best network

The search for an optimal architecture starts by generating an initial set of neural networks with few connections and no hidden layers. In each network only one single random shared weight is used and all the connections use the same random value for the weights. The networks are then ranked according to their performance. The process continues with subsequent training iterations. Each iteration generates  a new set of networks by modifying the best performing architectures thus far. Possible changes include randomly inserting new nodes, adding new connections, changing the activation function, or a combination of the three. The process continues until network performance no longer improves.  As iterations accumulate, only the best architectures survive, just like what happens in genetic algorithms (see  [3] for an introduction to genetic algorithms).

The best network architecture is the one that performs consistently well with a wide range of random weights. This, in turn, means that such weights are not really critical in determining the accuracy of the model. Obviously, when two different network architectures have similar performance, the simpler model is preferable. Opposed to  conventional deep learning models that perform well only after extensively tuning their weights, weight agnostic neural networks (WANNs) tend to perform well using just one random weight  for all  connections. Moreover, they perform even better and reach excellent accuracy if the weights are trained like in a standard setting.

Results and conclusions

Researchers have tested this approach on the MNIST database of handwritten digits, (the de-facto benchmark to compare machine learning methods on multi-class classification tasks), and shown that WANNs can classify MNIST digits as good as a single layer neural network with thousands of weights trained by gradient descent.  What makes the entire experiment important is the fact thas using WANNs you won’t do any training.

In the last twenty years, investigators have proposed many powerful deep learning architectures.  To name a few:

  •   Long-short-term-memory networks (LSTMs) that deliver breakthrough performance in sequence modeling tasks (machine translation, speech recognition, time-series forecasting).
  •   Convolutional Neural Networks (CNNs) and Residual Networks that are the most advanced algorithms for computer vision problems  (object detection and image classification). 

All these networks have something in common: they require a substantial training effort. In contrast,  WANNs do not require such massive training tasks to reach an acceptable performance.

In summary, this new approach allows an easier discovery of new architectures to solve more challenging problems in several business and scientific domains, just like LSTMs and CNNs have done so far.


[1] F. Rosenblatt, “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain”,  Cornell Aeronautical Laboratory, Psychological Review, vol. 65, no. 6, pp. 386–408, 1958

[2] A. Gaier et al., “Weight Agnostic Neural Networks”, 2019.

[3] M. Mitchell,  “An Introduction to Genetic Algorithms”, Cambridge, MA: MIT Press, 1996.

Subscribe to our Newsletter

Leave a Reply

Your email address will not be published. Required fields are marked *