After digging into concepts of machine learning methods, I think it is right time to  summarize again.

There are mainly four machine learning method, supervised learning, unsupervised learning, semi-supervised learning (including Andrew’s Ng’s feature-learning), and transfer learning. To better illustrate their differences, I would compare their logics to ways a teacher prepares his students for the final exam. Actually, I dare to guess some of above methods may come from certain teaching scenarios.

The basics of machine learning, is trying to ‘train’ a learner (a computer program) that is able to predict future data by delving into on-hand data sets (called training sets).  Machine learning methods differ on the information on training sets. We conclude part of state-of-art ML methods in classification as follows:

  • Supervised learning: both the features and the class info of training sets are known. So it is able to train a learner to classification with the data sets.
    Comparison: A math teacher who gets last year’s exam paper with the key, asks his students to remember the problems and answers to help them out on the coming exam, since he believe this year the exam is very similar.
  • Unsupervised Learning: on the contrary, only the features of training set are known, without the class info. So in principle it is impossible to train a learner to do classification. But it is able to draw some inference from the data, like feature extraction (self-taught learning), clustering (K-means) and dimension reduction (PCA).
    Comparison: A math teacher gets last year’s exam paper but without the key. Although not being able solve any,  he tries to lead his students understand the problems in another way.
  • Semi-supervised learning: this is a situation between SL and USL, since its training set are composed by two kinds of data, one with class info, one not. But the size of the labeled set is too limited to be used as the SL. Still, for the unlabeled part, although we do not know which class an example belongs to, but we are sure that all of them share the same class space with the labeled data. Therefore, how to use the unlabeled data to help the labeled one is the research topic of semi-supervised learning. Some proposes to use the unlabeled part of training data as a pre-test set. First, the learner which is solely optimized on the labeled training set will give out a pre-judgement of the unlabeled training data. And then use the whole training set (all with labels) to execute the SL. Other methods includes first extract the features from the unlabeled data, and then implement SL methods to those features on the labeled data set.
    Comparison: A math teacher who gets last year’s exam paper but with a portion of the answers. But he still tries to recover the entire answer sheet, by investigating the unlabeled data as in the USL and seeking its relevance with SL.
  • Transfer learning: another variety of semi-supervised learning. Here we have only one labeled data set, but it is a dataset about a different, but related classification problem B. So can we train a learner to solve the problem A? Some one successfully utilize a training set with limited labeled in-class data, with a larger inter-class transfer data to achieve the goal. But the results show that when the portion of the labeled in-class data is large enough, it approximates the SL, leaving out the effect of transfer learning.
    Comparison: A math teacher who cannot get last year’s exam paper or keys, but in return gets last year’s physics paper and the answers form his colleague. Given the math exam and physics exam are highly correlated, can he prepare well his students for the coming math exam? It has been verified that the answer is possible.
  • Self-taught learning (Feature Learning): this is a comparably new idea proposed years ago. The problem comes when we have a limited-size labeled training set, which is not enough to do the SL. But we have affluent accesses to unlabeled data in various classes (not limited to the classes we concern). So can we develop a learning method to use the abundant information, in order to construct a learner to classify?Andrew Ng proposed the self-taught learning method. By ‘automatically’ finding a way to extract appropriate features from the unlabeled data set, the learning algo is able to translate the labeled data into a new space (usually a sparse space). Using such features instead can dramatically increase the performance of the classifier.
    Comparison: A math teacher who has limited amount of last year’s exam paper and keys, insufficient to teach his students. But he tries to download a range of exam papers wherever he can find. After investigating such materials, he discovered a new structure behind all the papers, which help a lot to increase the possibility to get the right answer. So he prepared well his students for the coming math exam with such method. That is what Andrew was doing in his paper, called transfer learning from unlabled data.


[1] G. E. Hinton and R. R. Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks,” Science, vol. 313, no. 5786, pp. 504–507, Jul. 2006.
[2] N. Grira, M. Crucianu, and N. Boujemaa, “Unsupervised and semi-supervised clustering: a brief survey,” A review of machine learning techniques for processing multimedia content, Report of the MUSCLE European Network of Excellence (FP6), 2004.
[3] X. Zhu, “Semi-supervised learning literature survey,” Computer Science, University of Wisconsin-Madison, vol. 2, p. 3, 2006.
[4] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng, “Self-taught learning: transfer learning from unlabeled data,” in Proceedings of the 24th international conference on Machine learning, 2007, pp. 759–766.
[5]W. Dai, Q. Yang, G.-R. Xue, and Y. Yu, “Boosting for transfer learning,” in Proceedings of the 24th international conference on Machine learning, 2007, pp. 193–200.