Part(1/2) : What and How of Autoencoders

An autoencoder is a neural network that consists of an input layer, one or more hidden layer(s) and an output layer, pretty much like any other neural network that you may have heard of or worked with.

But there are some very specific characteristics that an autoencoder possesses which makes it unique and they are:

  1. The input and output layers of an autoencoder always have the same dimension. The hidden layer can be any dimension(preferably lesser than the input and output layers) but the input and the output dimensions should always match! Why is it so? Read on to find out!
  2. Another distinct feature of an autoencoder is that unlike other popularly used neural networks, it is unsupervised. This means that an autoencoder does not need label information and this makes them particularly useful for some tasks that need to learn features or extract representations

Architecture of an Autoencoder

A Vanilla Autoencoder. Image by author

The figure represents the simplest possible autoencoder architecture possible. Such autoencoders are called vanilla autoencoders and can be used to understand the principle of the network quite well.

As already mentioned in the characteristics, the network contains an input, a hidden and an output layer and the input and output dimensions are the same ie. 9 in this case. The reason for these dimensions to be strictly same can be justified by understanding how an autoencoder works.


Encoder in an Autoencoder. Image by author

The input portion of an autoencoder behaves like an encoder. That is, the use of a non linear activation function can encode the input data which gets stored in the hidden layer.

In form of equations, these can be represented as:


x : input data

y : encoded values

b1 : input bias

W : input-to-hidden layer weights

f() : non-linear activation function, sigmoid in this case


Decoder in an Autoencoder. Image by author

The output portion of autoencoder decodes the information that has been stored in the hidden layer in encoded format. An autoencoder reconstructs the output that is as close as possible to the input (ideally same as input). This is the reason why the output and input dimensions must match.

In form of equations, these can be represented as:


y : encoded value stored in the hidden layer

z : output of autoencoder

b2 : hidden layer bias

W` : transpose of input-to-hidden layer weights

f() : non-linear activation function

E(W,b) : mean square error cost function

The cost function E(W,b) depicts how different the reconstruction at output is from the input data. The aim of the autoencoder is to minimize the reconstruction error. At the onset, it might seem that autoencoder is a one-on-one operation that just matches output with input. However, it is during this process that the autoencoder might end up learning interesting features or representations from the input data which gets stored in the hidden layer. Choice of an appropriate non linear activation function plays an important role here. Also as the dimension of hidden layer is typically lesser than that of the input layer, it also leads to dimensionality reduction.

Backpropagation algorithm is used to reduce the cost function and the choice of cost function, whether mean square error or binary cross-entropy depends on the type of data and the application to be implemented. The same applies to the choice of the non linear activation function.

Implementation Details

In this section, I attach some snippets of implementation of an autoencoder. Some important details of implementation of an autoencoder or of any neural network for that matter revolves around the forward and backward propagation respectively.

Dimensions of the vectors:

It is important to note that all the data that is dealt with is always a vector or a matrix. For the autoencoder implemented here, the dimensions of the data is as follows:

input data = [9x1]

output data = [9x1]

hidden layer = [5x1]

hidden layer bias = [5x1]

output layer bias = [9x1]

input-to-hidden layer weights = [5x9]

hidden-to-output layer weights = [9x5]

As a first step for implementation, we start by initializing the variables. We initialize the weight matrix and the bias vectors to random initial values.

To implement the feedforward flow of the autoencoder, ie. the calculation of the encoded weights, we simply multiply the input values with the randomly initialized weight matrix. To this we add the bias vector which too has been randomly initialized.

In order to reconstruct the output such that the it is a representation of the input, the error function must be reduced and backpropagation algorithm is used for this purpose. The main gist of the backpropagation algorithm is to adjust weights and biases individually. This is done using gradient descent and it can be represented as:

Gradient descent for weight
Gradient descent for bias

where eta is the learning rate and is one of the hyperparameters that is to be determined experimentally.

The discussion about a neural network cannot be complete without the mention of the hyperparameters. Here our hyperparameters are the dimension of the hidden layers, the learning rate, and momentum (if used). The number of epochs over which the model is trained is also important. These values can be determined experimentally such that the error goes on reducing.

Autoencoders are one of the simplest and most popular unsupervised models. There are many versions of them but the underlining principle and essence of the neural network always remains the same.

In the part 2 of this article I will cover the application where I used this autoencoder and will discuss the results achieved.

Until then, feel free to reach out to me if something wasn’t clear enough or needs improvement. And, happy coding!!

Student for life. An aspiring researcher in CV and ML. Here to share all I’ve done or learnt.