Agilytics aspires to become pioneer in Deep Convolution Neural Network (DCNN), which has achieved remarkable success in various Computer Vision applications. The task of Semantic Segmentation is as per this trend.
Regular image classification DCNNs have similar structure. These models take images as input and output a single value representing the category of that image.
In contrast to image classification, for semantic segmentation we want to make decisions for every pixel in an image. Therefore, the model needs to classify each pixel as one of the pre-determined classes. In other words, semantic segmentation means understanding images at a pixel level.
Semantic segmentation doesn’t differentiate between object instances. We try to assign an individual label to each pixel of a digital image. Regular DCNNs such as the AlexNet and VGG aren’t suitable for dense prediction tasks. Regular classification DCNNs would output a dense (non-spatial) vector containing probabilities for each class label. We feed this dense vector to a series of up-sampling layers. These layers work on reconstructing the output of the first part of the network. The goal is to increase the spatial resolution so the output vector has the same dimensions as the input.
The up-sampling layers are as per the strided transpose convolutions. These functions go from deep and narrow layers to wider and shallower ones. Here, we use transpose convolutions to increase feature vectors dimension to the desired value.
These two components of a segmentation network are called: encoder and decoder. The first, “encodes” its information into a compressed vector used to represent its input. The second (the decoder) works on reconstructing this signal to the desired outcome. FCNs, SegNet and UNet are some of the most popular network implementations based on encoder-decoder architectures .
Contact bd@agilytics.in for more onto this. Agilytics github page will be shortly shared. Keep a watch on this space.
Comments