Convolutional Autoencoder (CAE) - Python for Integrated Circuits - - An Online Book - |
||||||||
Python for Integrated Circuits http://www.globalsino.com/ICs/ | ||||||||
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix | ||||||||
================================================================================= Autoencoders has been in the deep learning research project for a long time, especially, is most popular for data compression tasks. Autoencoders and Convolutional Autoencoders are both types of neural networks used for unsupervised learning, but they have differences in their architecture and use cases.
Regarding data compression via autoencoders, we can think about such examples. Assuming that a program is created to send some data from one PC to another. The data is a collection of data points, each has two dimensions as shown in Figure 4215a. Since the network bandwidth is limited, every bit of data which is going to be sent should be optimized. When sending the data, instead of sending all the data points, we can send only the first dimension of every data point to the other PC, and then at the other PC, we compute the value of the second dimension with the linear relationship from the first dimension. This method requires some computation, and the compression is lossy, but it reduces the network traffic by ~50%. In practice, autoencoders can be done with TensorFlow.
The problem above is two-dimensional (2D). If the data is high-dimensional, then the set of data points can be given by {a(1), a(2), ..., a(m)}, where each data point has many dimensions. Therefore, a method is needed to map the points to another set of data points {z(1), z(2), ... , z(m)}, where z’s have lower dimensionality than a’s and a’s can faithfully reconstruct a’s. Recall that sending the data from one PC to another includes the steps below: Therefore, the following equations can be obtained: The error can be given by, When no other constraints are imposed on the loss function, the auto-encoder weights tend to learn the identity function. Some form of regularization then must be imposed on the model so that the model can uncover the underlying structure in the data. Some forms of regularization include adding noise to the input units [1], see Equation 4215e, and requiring the hidden unit activations be sparse [5] or have small derivatives [6]. These models are known as de-noising, sparse, and contractive auto-encoders respectively. Error = Decoder(Encoder(X + noise)) - X ------------------------------------------------------- [4215e] The auto-encoder model
learns a function that minimizes the squared error between the input a(i) ∈ Rn and its reconstruction ã(i): In practice, if a(i) is a two-dimensional vector, it can be possible to visualize the data to find W1, b1 and W2, b2 analytically. However, in most cases, it is difficult to find those matrices using
visualization only; therefore, gradient descent is needed. As the goal is to have ã(i) is approximately equal to a(i), thus the sum of squared differences between is given by objective function, Autoencoders are unsupervised neural network models that summarize the general properties of original data in fewer parameters while learning how to reconstruct it after compression. [3] This particular architecture above is also known as a linear autoencoder as shown in the network architecture in Figure 4215b. In the case in the figure, we are trying to map data from 8 dimensions to 4 dimensions using a neural network with one hidden layer z. The activation function of the hidden layer is linear, and thus it is called linear autoencoder, which works for the case that the data lie on a linear surface. The Encoder Network is used to transfer the original data to a lower dimensional representation, namely it approximates the function to match the data from its full input space into lower dimensional coordinate system to take the advantage of the structure of the data. The decoder network attempts to recreate the original input using the output of the encoder, in other words, it tries to reverse the encoder process. The z is called embedding vector. If the data lie on a nonlinear surface, it makes more sense to use a nonlinear autoencoder. Furthermore, if the data is highly nonlinear, one could add more hidden layers to the network to have a deep autoencoder.
Auto-encoders are popular models
for performing unsupervised feature extraction for highly nonlinear data. In other words, unlike in the supervised learning, the data above only have a’s but do
not have b’s. The unsupervised learning and data compression through autoencoders require modifications in the loss function. The simplest implementation of an auto-encoder is
a simple feed-forward neural network where the learned latent
representations are given by the hidden vector, While the autoencoder mentioned above had shown impressive results, they do not directly address the structure of images. Convolutional neural networks (CNNs) [7-8], see page4237, show a way to reduce the number of connections by having each hidden unit only
be responsible for a small local neighborhood of visible units. Such schemes allow for dense feature
extraction followed by pooling layers when stacked denoising could allow the network to learn over
larger and larger receptive fields. Convolutional auto-encoders (CAEs) combined aspects from both autoencoders and convolutional neural nets, which maks it possible to extract highly localized patch-based information in an unsupervised fashion. The CAE is an unsupervised learning model (page4322) for extracting hierarchical features from natural images. CAEs can be stacked in such a way that each CAE takes the latent representation of the previous CAE for higher-level representations [4]. A CAE is similar to a traditional auto-encoder except it uses convolutional (and optionally pooling) layers for the hidden layers in the network and the kth feature map outputted by a convolutional layer is given by, Similar to CNNs, max-pooling can optionally be applied on the feature maps outputted by a convolutional layer. The activation values would then be the max of multiple k by k patches spanning across a given feature map. For highly non-linear data, a CAE can be stacked (CAES) to obtain a deep structure for modelling the data (similar to [1]). Experiments on MNIST, this CAE model is capable of learning robust feature-representations for image data. The deep topology of a CAES enables each layer of the network to model increasingly abstract latent representations of the input based on the latent output of the previous layer. Therefore, CAES offers a powerful model for learning robust hierarchical latent representations for highly-structured inputs such as natural images. Given the similarity in structure between a CAES and popular CNN classification models based on the architecture of AlexNet [2], it follows that the learned weights in a CAES can also be used for initializing the weights in the latter group of networks. The tricky part of CAEs is at the decoder side of the model. During encoding, the image sizes get shrunk by subsampling with either average pooling or max-pooling, resulting in information loss which is hard to re-obtain while decoding. The intuition is that this will ensure the weights in the CNN are initially set to sensible values for training with back-propagation. In practice, different developments of CAEs are: Autoencoders have many interesting applications:
============================================
[1] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11(Dec):3371–3408, 2010.
|
||||||||
================================================================================= | ||||||||
|
||||||||