Exploring SVM Kernels

Support Vector Machines are a popular option for data scientists wanting to explore and model higher dimensional data. Despite their lack of scalability, they’re popular for prototyping different kinds of classifiers for systems where there are large numbers of variables. At the core of the SVM is the use of a kernel function, which enables a mapping of the feature space to a higher dimensional feature space. Therefore, if we’re unable to find separability between classes in the (lower dimensional) feature space, we could find a function in the higher dimensional space, which can be used as a classifier.

In this Jupyter notebook I’ve explored a couple of different types of kernel functions for bivariate, two-class data, where an SVM is being used to separate out these classes. Since these classes are not linearly separable, the use of kernel functions here enables us to find the best possible hyperplanes that can solve the separability problem. What’s interesting to note is that the convex hulls (in this case, polygons) for these classes are overlapping in the 2D space. This is a clear indicator of a lack of linear separability.

The use of a kernel opens up the possibility of linear separability, since we add an additional spatial dimension on which these points get distributed. Specifically here we have two different kernel functions that are explored:

$\phi(x_{1}, x_{2}) = (x_{1}^2, x_{1}x_{2}, x_{2}^2)^{T}$

$K(x_{1}, x_{2}) = a e^{{-\frac{1}{b} ||x_{1} - x_{2}||^{2} }}$

The latter is called the radial basis function kernel, or the RBF kernel. Visualizing this kernel for the data we’d generated gives us the following nice image. What’s easily visible here is the possibility of separating out the classes thanks to the additional rbf dimension that has now been added.

Upon training the SVM classifier, visualizing the results gives us the below plot. The thick grey line is the decision boundary that enables us to separate the originally linearly inseparable classes in the dataset.

There are other explorations I hope to do on this notebook in future, specifically the process of calculating the sign (class label) of a dataset, based on the Lagrangian – which indeed brings us to the idea of the SVC being a maximal margin classifier. This is also referred to as the dual problem of the SVM. For another post!