Just had a talk with my supervisor on the hinge loss in SVM. Below is Chunhua’s comments.

Actually the optimal loss function should be like the “0-1” loss, which has either no loss for the right output or a constant loss for the wrong decision.  However, this loss function is not convex, incurring extremely high computation complexity during training such a machine. As a result, people tend to find some convex substitute for it. The hinge loss has already been proven to be the tightest convex approximation of the “0-1” loss, though it is un-robust when there are far away outliers. But it has been tested as an efficient approximation for the original “0-1” loss.

Research investigating alternative  loss function other than the hinge loss can be found

1. Yuille, A. L. and A. Rangarajan (2003). “The concave-convex procedure.” Neural Computation 15(4): 915-936.