Recently I have been into the investigation of current dictionary learning and encoding. Both dictionary learning and encoding are core components in the feature learning, since they construct the basis of the features and the coefficients of the original input respectively.

Dictionary is a set of vectors that span a new/sub space of the original example, with which we can see the examples from an alternative angle, and get a more comapct/sparse form of the signal. In computer vison, people extract the features of an image by building dictionaries.

However, early work on dictionary learning is like orthogonal transformation or orthogonal dictionary construction, which is limited by the dimension of the signals. But later when people find sparse coding on overcomplete dictionaries can better represent the natural signal and easy to process, the sparse coding gradually become the focus of research.

In early days, dictionaries are built in a hand-designed way, which would cost time and effort and is not adaptable general cases. To solve the problem, it is proposed to learn the dictionaries. That is the beginning of dictionary learning and variations of DL methods emerged as follows.

*I. State-of-art DL Methods:*

**1. Sparse Coding (with L1 norm)**

**Sparsity**: L1 norm of s(i)

**Solver**: Coordinate Descent Algorithm.

**2. OMP-k (or MOD)**

**Sparsity**: constrain k

**Solver**: Greedy selects s(i) first and then find out D with s.

**P.S**: OMP-1 is the same form of K-means, with

and then set

**3 PCA : transfer from linear dependent to linear independent **

**Sparsity**: This is a dimension-reduction method and thus the sparsity is decreased; but the G-PCA, an extension of PCA can achieve the sparsity.

**Solver**: SVD

**P.S**: Above optimization is to find out the vector that max the projection from sample set X to vector . The constrain means the vector must be normalized. A similar algo is **ICA**. Also, one my consider **Generalized PCA **another extension of PCA that constructs an overcomplete dictionary with sparsity.

**4. (Sparse) RBM (Restricted Boltzman Machine)**

RBM is an example of generative model in machine learning, which see examples as a series of i.i.d variables sampled from an unknown distribution. The goal of a generative model in unsupervised learning is to learn the most “accurate” united distribution of X, i.e, given the parameters of the distribution ,

So in fact the RBM works in the **ML way**.

The method that RBM utilizes to solve the problem is the **Markov Random Field (MRF)** theory. In MRF, variables are seen as independent vertexes in an un-directed graph and their relationship between are denoted by the edges.

Here the V are the examples and H are hidden nodes in the neural network. Then RBM constructs a two-layer neuron network with the visible and hidden layer and the parameters of the distribution play as the weights between the nodes.

By utilizing the factorization property of MRF, above optimization problem can be expressed as a series of products of conditional probability, and the parameters can be further solved. Then we can get the **new expression of the examples** from the **hidden layer.**

**Sparsity**: Sparse RBM

**Solver**: Original problem is hard to solve; use approximation via gradient descent methods.

P.S: Values of classic RBM is binary; however, there are extension of RBM with real values

**5. Sparse AutoEncoder (SAE)**

In the unsupervised learning, the SAE tries to recover the original examples as accurate as possible, by learning a neural network, i.e.,

The network can be composed of multiple layers. Since the output is as accurate as the input, we can assert that the weights and notes of each layer of the neural network is a good representation of the original signal. Moreover, the sparsity constrain can be introduced by imposing an extra item in the optimization objective. For example, in a layer the optimization problem can be formulated as follows,

where KL is a function with sparse constrain.

**Sparsity**: KL function

**Solver**: Backward Propagation

**6.Parametric Training Methods Family: **

A large family of methods …

*II. State-of-art Encoding Methods:*

The issue of encoding comes from accommodating different demands for the ‘codes’, say the sparsity. However, to get such rewards something needs to be traded-off. In most cases, ‘something’ is to what extent the code reflects the original signal accurately. Accepting this, we can mix up the dictionary learning and coding methods as much as we can. But the trade-off between approximation and sparsity must be tolerated.

State-of-art encoding methods include:

1. Sparse coding

2. OMP-K

3. Hard/soft threshold …

# III Summary:

Sparse Coding : Approximated Expression, Depict signals from the natural angle, Overcomplete Dictionary

Traditional Coding: Hi-fi Expression, Depict signals from the Math angle, Orthogonal/compact Dictionary

# REFERENCE:

*NIPS workshop on deep learning*, 2012.

*Proceedings of the 28th International Conference on Machine Learning (ICML-11)*, 2011, pp. 921–928.

*International Conference on Artificial Intelligence and Statistics*, 2011, pp. 215–223.

*Proceedings of the 24th international conference on Machine learning*, 2007, pp. 759–766.

*Proceedings of the IEEE*, vol. 98, no. 6, pp. 1045–1057, 2010.

*IEEE Signal Processing Magazine*, vol. 28, no. 2, pp. 27–38, 2011.

Thank you

I think this article creates a big picture for the rest of my study on this field.

I am new to this field(Unsupervised Feature Learning)

I try to learn about sparse coding through the article “Efficient sparse coding algorithms” NIPS 2006 . Is it a good place to start?