Andrew’s Course:

Q: In the Kernel method, why the dimension of feature function {f} and the number of features /theta, both equal to the size of the training data set, m ?

A by myself: By searching related articles, I finally arrives at the concept of Shattering and VC Dimension.  Accordingly, the SVM in space of n dimensions has n+1 VC dimensions. Thus, for a training set of m, the SVM must map to a space that has at least m-1 dimensions. So that is why the dimensions of {f} must be m, as well as its parameter set /theta.


Shattering, VC Dimension


1. Burges, C. J. (1998). “A tutorial on support vector machines for pattern recognition.” Data mining and knowledge discovery 2(2): 121-167.

2. Wikipedia: