We’ve just uploaded a spin-off research paper to arXiv titled “Sparse Unsupervised Capsules Generalize Better”. So what’s it all about?
You may have heard of Capsules Networks already – if not, have a read of one of these blog articles (here, here, here, or here (EM routing)), watch this video, or consult one of the two recent key papers:
- Dynamic Routing between Capsules (Sabour et al. 2017)
- Matrix Capsules with EM-Routing (Hinton et al. 2018)
Briefly, Capsules output a vector of parameters that describe the state of the entity represented by the capsule – for example, the pose of an object, or some information about the shape of a specific object instance. After training, it is shown that Capsules discover “equivariances”: These are ways in which the parameters can describe continuous changes in an entity. In the case of MNIST digits, this includes stroke width and digit skew. In our work, which is unsupervised, equivariances also include digit transformations (e.g. from a 3 to a 5, as shown in the figure above).
In addition, Capsules networks have a mechanism called Dynamic Routing that builds a “parse-tree” from the Capsules. The parse tree is a subset of Capsules across many layers that collectively agree on what is being observed in the input. As a result, after routing, the Capsules network describes the configuration of a set of entities it has found in the input, using a subset of all the available Capsules.
Attention & Selective Memory
We are interested in capsules for several reasons. First, there may be important representational gains from the Capsules approach. We are also impressed with the dynamic routing mechanism of integrating feedback in a stable manner, and that routing can also be used as a selection mechanism to drive the memory towards particular perceptions. We can already see an example of this in the Multi-MNIST classification task from Sabour et al. (see figure). Finally, routing also provides an attention mechanism, because routing weights can be targeted at particular areas of the hierarchy. So with a Capsules network, we get stable feedback integration, selective memory and an attention mechanism straight out of the box!
Given our focus, we made an unsupervised Capsules network derived from the Dynamic Routing between Capsules (Sabour et al. 2017) implementation. As our paper explains, simply making the network unsupervised didn’t work. We had to add a form of sparse training as well.
In Sabour et al. they trained the network on MNIST images and then tested classification accuracy on affNIST images (affine-transformed MNIST), achieving 79% accuracy. They also report 66% accuracy for a conventional convolutional network with a similar number of parameters. This suggests that the Capsules representation was able to generalize from MNIST to affNIST.
In our work, we trained our sparse unsupervised capsules network on MNIST and then used an SVM to classify affNIST digit labels given the activity of the deepest unsupervised capsules layer. We managed to improve the affNIST score to 90%! Hence, we conclude that sparse unsupervised capsules do generalize better than supervised capsules, at least in their current form.
We compared our score to all the affNIST results we could find, and noticed that our result is similar or better than most conventional networks even when they are trained on affNIST directly. This looks promising for capsules in general.
Our result also has another implication. Supervised training of latent capsules layers enforces sparseness in shallower layers too. But this effect will only work to a limited depth. Our investigation of the properties of dense unsupervised capsules suggests that without sparse training, you can’t have deep capsules networks. Sparse training might be a key enabler of deep capsules networks.
You can see some of the equivariances produced by our network in the headline image at the top of this page. Since the network is unsupervised, the equivariances produced include digit morphing.