Support Vector Machines (SVMs) are a popular machine learning algorithm with lots of usages and I learned something quite amazing about them. But first let me tell you what SVMs do.

Imagine we have data points of two different classes:

A support vector machine can take that data as input and find a line that separates it. We call this line the decision boundary. But there are many separating lines, which one is the best? SVMs are going to choose the line that maximises the margin. Margin is the distance between the separating line and the nearest point of either of the two classes.

The intuition behind trying to maximise the margin is that a decision boundary with a smaller margin is more prone to overfitting, while a larger margin makes the result more robust.

# Non-linear data

Now look at this data:

Given what we’ve seen so far we’d think an SVM couldn’t separate this, as there just isn’t a good separating line for this data.

But there’s a trick! It works like this: we’ll add a new feature to the data. So far we had 2 features, $x$ and $y$, and we’re gonnna add one more. In this example I’ll conveniently choose a value for the new feature that I know works: $x^2 + y^2$

So now we have three dimensions and the $z$ axis has value $x^2 + y^2$. This is equal to the distance of each point from the origin.

Is this now linearly separable?

And the answer is yes, there is a hyperplane that can separate the pink from the blue points.

The next step is to take this solution and go back to the original two-dimensional space. Since this line on the $z$ axis has an equation of $x^2 + y^2 = something$ it corresponds to a circle:

# What we did

Here’s what we did:

That’s pretty cool!

# Another example

One more example, this time using pen and paper.

Imagine we have this data:

The pink and blue points are not linearly separable. We’re going to do the same trick and add a new feature. This time we’ll choose $z = |x|$. So there is a $z$ axis on which the points have height equal to the absolute value of $x$.

To do this with the paper I’m going to fold on the $y$ axis and then raise the sides by 45 degrees.

Is this data now linearly separable?

It’s not very easy to see, but yes. I’m going to demonstrate that using a plastic folder as a hyperplane.

Finally we’re going to take this solution and map it back to the original two-dimensional space by unfolding the paper.

We now got a non-linear separation of the data.

# Code

That’s all pretty fun, but in real life you don’t really get to come up with new features like that.

There are some so called kernel functions, which take a low dimensional feature space and map it to a higher dimensional feature space, but do so in a computationally effective way using the so called kernel trick. I don’t understand them well enough, but I know that scikit-learn comes with a bunch of kernels that can be used out of the box.

Here’s an example of creating some data similarly shaped to the example above and separating it using an SVM:

```
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
from mlxtend.plotting import plot_decision_regions
# create some V shaped data
np.random.seed(6)
X = np.random.randn(200, 2)
y = X[:, 1] > np.absolute(X[:, 0])
y = np.where(y, 1, -1)
# train a Support Vector Classifier using the rbf kernel
svm = SVC(kernel='rbf', random_state=0, gamma=0.5, C=10.0)
svm.fit(X, y)
plot_decision_regions(X, y, svm, markers=['o', 'x'], colors='blue,magenta')
plt.show()
```

which outputs: