Artificial neural networks (ANNs) – are conceptually simple; the combination of inputs and weights in a classical ANN can be represented as a single matrix product operation followed by an elementwise nonlinearity. However, as the number of learned parameters increases, it becomes very difficult to train these networks effectively. Most of the tricks that enable modern deep learning were the result of careful examination of parameter learning rules.
In this article I’ll explain why the current learning rules that underpin deep learning are not biologically plausible – what that means, and why you should care about it.
What does biologically plausible mean?
We know a lot about neuron learning and architectures in human brains, but there’s a lot more still that we don’t know. New neuronal learning processes are still being discovered. We can’t build an accurate, functioning simulation of even a piece of human brain (projects aiming to do this have spent billions of dollars in the attempt). In contrast, the fundamentals of artificial networks trained by deep error propagation have been relatively constant since the 1980s. They work – and they work well. But we can be sure that they don’t work the same way as biological brains. The way they are trained is fundamentally different.
Since we don’t know exactly how “wet” (i.e. biological) brains learn, we can’t build working ANNs that are biologically accurate. Often we can’t say whether any specific feature is an accurate equivalent of biology or not. The best we can say in these circumstances, is that wet brains might do it this way – i.e. it’s biologically plausible.
Why should you care about biological plausibility?
I can think of three good reasons you should care about this. First, we know from experience that different algorithms are better and worse at different problems. So it’s definitely a good thing to have more variety of ANNs in your toolbox. Second, evidence of wet brain performance is that it has better performance than current ANNs in many ways – learning from fewer experiences, for example. Third, wet brains are fantastically more efficient in terms of memory, compute and energy consumption. Therefore, if we can make ANNs more like wet brains we can expect to benefit from some of these advantages.
What is credit assignment?
Credit assignment is essential for incrementally learning parameter values. It is the process of working out how much each parameter was responsible for errors, rewards, or other measurable consequences.
In modern ANNs credit assignment is almost always performed by deep backpropagation (BP). We calculate a loss at the output of the network, and then propagate this loss back to the weights in all layers that contributed to it. Responsibility for this loss gets translated layer by layer (hence, deep) using the Chain Rule from calculus. Almost all modern ANNs use deep BP. Figure 1 shows some examples – from a simple feed-forward network to recurrent networks and convolutional architectures. Note that deep BP is used to associate current errors with historical network activity, allowing errors to travel backwards in time to the moment they occurred. These time-travelling synapses are very much not biologically plausible.
Figure 1: Credit assignment by deep backpropagation in various neural network architectures. Magenta lines indicate backprop paths from losses to learned parameters in shallower layers or earlier timesteps. (a) Fully-connected feed-forward network. (b) Recurrent neural network (RNN) with Back-Prop Through Time (BPTT). (c) Feed-Forward autoregressive network using dilated convolutions (e.g. WaveNet).
Wet brains don’t do deep-BP
Many very smart people have looked and not found any neurological computation that implements BP. Biological neurons lack the precise reciprocal connections for BP-like credit assignment, where outputs are later transformed into a derivative of the loss with respect to that output. Similarly, biological neurons lack the necessary per-synapse memory of past inputs allowing retrospective modification.
Several researchers have searched for biologically plausible alternatives to deep BP. Some of my recent favourites are:
- Luo, Hongyin, Jie Fu, and James Glass. “Adaptive Bidirectional Backpropagation: Towards Biologically Plausible Error Signal Transmission in Neural Networks.” (2017)
- Balduzzi, David, Hastagiri Vanchinathan, and Joachim Buhmann. “Kickback cuts backprop’s red-tape: biologically plausible credit assignment in neural networks.” (2015)
So far, these attempts have been ultimately unsatisfactory, introducing constraints on the types of functions that can be learned or evidencing significantly impaired performance compared to deep BP.
Yet, what we know of the biology says there must be another way.
Biological plausibility criteria
In our recent research we have adopted the following criteria for biological plausibility.
- Only local (or global) credit assignment. No back-propagation of errors between cell-layers.
- Only immediate credit assignment. No synaptic memory beyond the current and/or next step.
- No time-travel, making use of past or future inputs or hidden states.
- No labelled data required for training (although can exploit it if present).
We do not claim these criteria are sufficient for complete biological plausibility and we can’t say the result is an accurate analogue of biological neuron learning. However, we aim to avoid the most implausible features of conventional machine learning.
Ok, let’s allow a little bit of BP
The computational capabilities of single-layer networks are very limited, especially in comparison to two-layer networks. But in theory, a two layer network of infinite hidden layer size could be a universal function approximator! So perhaps we should be more open-minded about a little bit of BP.
Biological neurons are known to perform “dendrite computation”, involving integration and nonlinearities within dendrite subtrees (Guergiuev et al., 2016 and Tzilivaki et al., 2019). This is computationally equivalent to 2 or 3 conventional artificial neural network layers. For this reason we allow ourselves to use error backpropagation across two ANN layers, under the assumption that this could approximate dendrite computation within a single biological cell layer, and training signals propagated inside cells rather than between them.
Biologically plausible credit assignment has profound implications. An alternative way to train networks that’s local in time and space, potentially much more efficient and potentially more powerful would be a great asset to machine learning research. The big question is whether we can figure out learning rules that work well under these constraints. Our recent papers suggest that it is indeed possible, although we still have some way to go to reproduce the many successful results in deep learning without the deep backpropagation.