Below abstracts some of my understanding about SMO, after reading Platt’s paper.
The basic idea of SMO(Sequential Minimal Optimization), as revealed by its name, is a way of decomposing a large QP problem in SVM into a series of small QP problems with just two variables , which can get an analytical solution. Thus the original QP in SVM with large database can be solved efficiently.
The basic idea of the method is iterating with following two basic procedures: selecting two variables out of all and optimizing the object function with respect to them. The algo repeats over and over until all the vars satisfy the KKT condition, which means that it arrives at the optimal solution (whether it is a global optimal or local optimal depends on if the problem is convex or non-convex).
Why select two, instead of one, three or more ?
Since all the vars in SVM are linear correlated by the equation constrain, one cannot optimize one var without changing the values of other. So at least two of should be jointly optimized.
How to choose the two variables?
We basically start the algo by picking an example (corresponded to its var) that violates the KKT condition. Then we find the second var within the non-bound (0 < alpha < C) examples that are as defined as SM in the linear-separable case. I think there are more reasons for that, and other explanations are needed. If you know why, help me out with comments, thanks!
1. Platt, John. “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines.” (1998).