# Support Vector Machine (SVM) Classification

The support vector machine (SVM) is a powerful discriminative classification technique. Using the labelled training vectors, SVM training finds a separating hyper plane (H) that maximizes the margin of separation between these two classes. Based on trining data, SVM can be divided into two  categories: (a) Linearly separable training, (b) Non-linearly separable training.

✓ An example of a SVM trained using linearly-separable data is given in following Figure.

✓ Let’s define the hyperplane H such that:
wTxi + b ≥ +1 when yi =+1
wTxi + b ≤ -1 when yi =-1

The distance between H and H1 is y(wTx+b)/||w||

✓ The distance between H1 and H2 is: 2/||w||.

The distance between H1 and H2 is called margin. We want a classifier with as big margin as possible.  In order to maximize the margin, we need to minimize ||w||. With the condition that there are no datapoints between H1 and H2:
wTxi +b ≥ +1 when yi =+1
wTxi +b ≤ -1 when yi =-1

✓ Find w and b such that
||w||2/2 minimized and for all (xi,yi), yi(wTxi +b) ≥ 1

We are now optimizing a quadratic function subject to linear constraints. It’s Lagrangian optimization problem.
w = ∑ αiyixi
b = yk – wTxk

✓ Non-linearly separable training
In the case of performing SVM training on non-linearly separable data, the missclassication of training examples will cause the Lagrangian multipliers to grow exceedingly large and prevent a feasible solution.

The idea is to gain linearly separation by mapping the data to a higher dimensional space. An SVM kernel is used to transform training and testing data into a higher dimensional space, which provides better linear separation.

✓ A kernel function is defined as follows,
K(xi,xj) = Φ(xi) Φ(xj)

where Φ(x) is a mapping function used to convert input vectors x to a desired space. Linear, polynomial, radial basis function (gaussians) and sigmoid (neural net activation function) are used as non-linear mapping functions.

An example of non-linearly separable data and how kernal function transforms into separable data is shown in below Figure,

SVM classification can be used in several applications, such as natural language processing, computer vision and bioinformatics.