The more pixels an image contains, the more information we can acquire. The more information you can acquire, the more applications you can use the images into. Image processing techniques are applied in a wide variety of fields, like medical imaging, surveillance, robotics, industrial inception and remote sensing.
High resolution means that the numbers of pixles per inch, dots per inch is higher than usual, each according to its aspect ratio. Therefore a high-res image can provide important that can be considered critical for some applications. Segmentation is one of the most crucial techniques that is introduced as a useful method for a lot of applications including image analysis, annotation..etc.
Categories of four approaches for single-image super resolution
Super-resolution is the process of enhancing the resolution of a pre-acquired images to be able to extract more features from them. Single-image super resolution can be categorized into one of four approaches. Prediction based approach, image statistical approach, edge based approach, and example-based approach.
Prediction based approach
In this approach, a predefined mathematical formula is used. There is no use for training data in this approach. Sometimes we can conduct something like weighted averaging of the neighboring pixels to a certain pixel in a low-res image. This is called Interpolation-based super resolution. Interpolated pixel intensities are somewhat similar to neighboring pixels locally. This can enhance the smoothness of the pixels but in turn will lower the quality of the edges.
Edge Based approach
Edges are a defining characteristic in image segmentation and analysis. They play a prime role in visual perception. Edges based super resolution approaches take in consideration some features like the depth and width of an edge as well as the gradient. Therefore, the reconstructed high-res images have high-quality edges with proper sharpness and limited artifacts. But these approaches underperform in places with high frequency like textures and rapid variations.
Image statistical approach
This approach uses statistics of gradient variations in different regions of high res images to predict the correct mapping function to be able to produce the high-res one.
Patch Based Methods
This approach uses training data that is usually composed of low-res/high-res pairs of images. It is called patch-based approach as patches are cropped from the training images to learn mapping functions. Various learning methods of the mapping functions have been proposed such as weighted averaging, SVM and kernel regression.
Convolutional Neural Networks
Conv. Nets have been utilized to solve the super resolution problem by directly learning an end-to-end mapping between low- and high-resolution images. It is well known that conv. Nets are great learners of the images features throughout the different layers. Imagine applying the feature learning to get the mapping between the low res and high-res images and how do they correlate.
Deep convolutional neural networks have recently demonstrated great success, this is mainly because of its abilities in image classification. Segmentation-based neural networks as well have opened doors for object detection and therefore was used in a lot of autonomous applications. Several factors contribute to the importance of this progress, the efficient training on powerful GPUs and the presentation of ReLUs (the Rectified Linear Unit) which helped in faster convergence and keeping good quality. Availability of datasets contributes a lot as well. Convolutional neural network has been applied to image denoising as well which serves well in our problem here.
Conv. Nets for super-resolution:
So, the approach consists mainly of 3 steps. But let’s first consider a low-res image, if we upscaled it to the desired new size using one of the interplotaipon techniques we dicuseedd earlier, we would have finished our pre-processsing phase. Lt’s get to the actual 3 steps that contruct the approach.
Remember, the goal is to get an image that is as similar as possible to the ground truth high-res image that corresponds to the low-res image at hand.
First step is called patch extraction and representation, here; we extract the different patches that overlap among those of the low-res image. Each extracted batch is then represented as a high-dimensional vector. These vectors together comprise a set of feature maps. Extracting patches and representing them can be done by convolving the image by a set of filters. The first layer can be expressed like the following equation
F(Y) = max (0, W * Y + B)
W and B represent the filters and biases respectively, the output is composed of n feature maps.
Second step is the non-linear mapping. Here, each high-dimensional vector from step one is remapped into another high-dimensional vector. Each vector of the mapped ones represents a high-resolution patch. These vectors comprise another set of feature maps. The first layer extracts an n dimensional feature for each patch. We will map each of these n dimensional vectors into a new n dimensional one by applying a new set of filters. Each of the output new n-dimensional vectors is a representation of a high-resolution patch that will be used for reconstruction. We can always add more convolutional layers to increase the non-linearity.
Finally, the reconstruction phase begins where we aggregate the high-resolution patch-wise representations to generate the final high-resolution image. We define a convolutional layer to produce the final high-resolution image. Although the above three operations are somewhat derived from one or more of the approaches discussed earlier, they all together are used to form a convolutional neural network. Of course, all the filters’ weights and biases are to be optimized. The restoration quality of the network can be further improved when larger and more diverse datasets are available, and/or larger and deeper model is used. This network can deal with colored images that have three-color channels as well.
Below is the representation of the network with the layers corresponding to the previous three operations.
The code for the above model with the three convolutional layers can be something as the following:
def model(self):
conv1 = tf.nn.relu(tf.nn.conv2d(self.images, self.weights['w1'], strides=[1,1,1,1], padding='VALID') + self.biases['b1'])
#first conv. Layer with the weights and biases mentioned in the equation above
conv2 = tf.nn.relu(tf.nn.conv2d(conv1, self.weights['w2'], strides=[1,1,1,1], padding='VALID') + self.biases['b2'])
#second conv. Layer responsible for the re-mapping of feature vectors
conv3 = tf.nn.conv2d(conv2, self.weights['w3'], strides=[1,1,1,1], padding='VALID') + self.biases['b3']
#third conv. Layer responsible for reconstruction
return conv3
Below is the difference in output between an interpolation based approach and the convolutional network approach:
To sum up what we have talked about until now, conventional super resolution have been the inspiration that was used to build the deep convolutional neural network model. This model can learn an end-to-end mapping between low- and high-resolution images, adding pre/post-processing beyond the optimization. Additional performance can be further gained by exploring more filters and different training strategies. This approach can be further investigated and re-tuned to suit different close problems in theory as de-blurring or de-noising.
Do you have further questions or comments to this articles? You are welcome to post them in the comment box below. Just log in or register and leave a comment here.
References:
- Exploiting self-similarities for single frame super-resolution, Yang, C.Y., Huang, J.B., Yang, M.H.
- Image Super-Resolution Using Deep Convolutional Networks Chao Dong, Chen Change Loy, Kaiming He and Xiaoou Tang.
Comments
Report comment