Last week I devoted to learn state-of-art feature extraction methods in image recognition, with a purpose of understanding clearly the reason for learning features. Now I have read several survey papers and know a little about it.
How to Understand an Image
The first obstacle in image recognition is how to make a computer know the content of image. Regarding that computers can only process mathematical computations, scientists proposed to compute the likelihood between digital images. But the remaining problem is the likelihood of what. Composed of hundreds of pixels, an image set incurs a huge amount of calculations that computers can hardly afford to.
Inspired by how human brains process visual signals, researchers used the likelihood of features, a high level representation from groups of meaningful pixels of an image, to represent original images. By using the features, the computation complexity can be greatly saved with limited loss of information. By comparing the likelihood between a known image and a test image, the computer is able to know whether the test image includes particular objects in the known image, and thus recognize the test one. Besides, t
How to Extract Features
So the question is how to extract such features within an image.
A straight-forward idea is to extract features for the entire image, the global features. However, above method tends to mix foregrounds and backgrounds together, making it hard to differentiate features for target objects. In addition, image occlusions and clutter may largely impede the performance.
A sounding alternative is investigating local features. Within a local area, features are easier to be extracted. Specifically, local features can be obtained via two methods, the segmentation method and sampling patches. Although being able to accurately distinguish targets from noise backgrounds/foregrounds, the segmentation methods requires the prior knowledge of the image composition. However, such information may not be available for a test image unless the image is successfully segmented. So there is a question about ‘chicken and egg’. On the other hand, the sampling patches approach makes a different story. This approach first divides an image into piles of patches. For computation simplicity, features are extracted only from part of the patches. Finally, local features (usually vectors) are finally concatenated to form the full representation of an image, following the pooling procedure.
How to Extract Local Features
To step further, we come to the problem of extracting local features. By and large, there are two basic methods to design local feature extraction methods. The hand-designed method and the learning method. While the former one relies largely on analyzing low-level structural components of patches, say the corners, curves and regions, the latter one puts more emphasis on automatically generating high-level sematic representations of images. Therefore, hand-designed methods excels in accuracy but fails to outrun the learning method in efficiency, which is a more important property for large data applications.
A short summary of local feature extraction methods is shown below:
How to Evaluate a Feature Extraction Method
According to , there are 6 aspects on a the result features:
Above summary on feature extraction methods may act as a enlightenment for my current work in deep learning, especially on designing improved feature selection methods to construct the receptive fields for training dictionaries (feature set) for higher layers. I think there is possibility of improving the way of learning higher level features of A.Coates’s work, where they just simply select T nearest features out of one local block.