J. Shawe-Taylor and N. Cristianini (2000)

# Margin Distribution and Soft Margin

In: Advances in Large Margin Classifiers, ed. by A.J. Smola and P.L. Bartlett and B. Schölkopf and D. Schuurmans, pp. 349-358, Cambridge, MA, MIT Press.

Rather than considering the minimum margin, the paper focuses on the margin distribution. The latter is a more robust quantity than the minimum margin itself which can be easily decreased by a single mislabeled example. In particular the authors provide generalization bounds, which motivate algorithms maximizing the minimum margin plus the $2$-norm of the slack variables for those patterns violating the margin condition. This is not the standard setting in SV machines which in general use the $1$-norm of the slacks, however, it coincides with the target function of optimization algorithms such as the one of Kowalczyk and can be useful in this regard.