Part (2/2) : Using Autoencoders for Feature Extraction from PolSAR Images
In Part 1 of this series, I explained the whats and hows of Autoencoders. While there are ample examples listing the use of autoencoders for de-noising or dimensionality reduction, in this article I would like to demonstrate how it can also be used in applications like remote-sensing to extract features from satellite images.
In this example, I used a vanilla autoencoder defined in the first part of this article to extract features from fully Polarimetric SAR (PolSAR) image taken over Oberpfaffenhofen, Wessling, Germany. The Oberpfaffenhofen PolSAR image of size 6640 x 1390 pixels has a resolution of 1.5 m per pixel and has been captured by E-SAR sensor (DLR, L-band). The ground truth of the dataset is annotated with five classes namely, City (red), Field (yellow), Forest (dark green), Forest (dark green), Grassland (light green), Streets (blue).
The false color image and the ground truth images look like the ones shown here:
Skipping the PolSAR jargon as it is beyond the scope of this article, coherency matrix was used as the input feature vector of nine dimensions. The hidden layer encoded the input data and stored it in it’s 5 neurons. In the process it ended up learning some important features from the input vector. The encoded data was decoded and reconstructed at the output, as is expected from an autoencoder. The result was classified using k-NN algorithm.
I trained the model using stochastic gradient descent (SGD) for 100 epochs with a learning rate of 0.1 and momentum of 0.9 and used mean square error (MSE) as cost function.
Each neuron in the hidden layer encodes and learns some features from the input which can be plotted as a feature map.
After reconstruction and classification at the output layer, the result obtained looked something like this:
The vanilla autoencoder implementation achieved an overall accuracy (OA) of 70%. This might not seem very impressive as the model used was very simple but can certainly be improved with a more complex network architecture. Some parallel work has shown that multi-layer autoencoder networks perform better than their vanilla counterparts. Also, it is not enough to just comment on the OA of the whole image but individual accuracy of each class should also be commented on. As is evident from the classification result, field (yellow) and forest (dark green) have the most accurate classifications while grassland (light green), streets (blue) and city (red) have varied classification errors. The possible reason for this is besides the nature of the PolSAR data captured, the fact that the forest and field have highest number of samples to train on, also leads to reduction in the error. In fact, the class wise accuracy results are comparable even when different conventional techniques are applied for feature extraction. On reducing the dimensionality of data using t-sne, the result evidently shows how fields, forest and grassland are classified better than city and streets.
One might also ask, why use autoencoder when you can use CNN and achieve better results? Well, it makes sense to use a CNN if the data was much bigger than what was used for training purpose. For such a small dataset, a CNN seemed a little overkill and might have led to overfitting. Autoencoder provided a better solution and it is almost certain that with a slightly more complex network, the model would extract features more accurately than it currently does (this also remains as a future enhancement!).
Autoencoders are not as widely used in real-world application and when used only finds applications in data denoising, dimensionality reduction and variational autoencoders. However, I think they are very simple and can be used efficiently for feature extraction and the fact that they are unsupervised makes them an attractive choice for applications that do not have high quality labels available, like in remote sensing.