After a while concentrating on more abstract stuff I thought I would return to Support Vector Machines. These are primarily classifiers which assign data to one of two categories. E.g. in the picture below, red or blue. Having read up on elementary vector geometry, and more optimisation stuff (through economics) I found the subject much more penetrable.
PyML is pretty easy to get hold of and install. Don't expect much in the way of documentation though. These are my notes on how to wring something visible out of it as a 'Hello World' use of SVM. Now after writing some code to generate a data set (more on that below), the following few lines get us to a visible output:
from PyML import * from PyML.demo import demo2d mu=[array([1,-2,3,-3]),array([-2,3,-1,3])] data = transformed_gauss( 150, f2, mu ) data.attachKernel( 'Gaussian' ) s = SVM() s.train( data ) demo2d.setData( data ) demo2d.decisionSurface( s )
You can then use cross validation method to get an estimate on the classifier's performance:
In [36]: s.cv(data)
[...]
Confusion Matrix:
Given labels:
0 1
0 52 23
1 22 53
Here's the code used to generate the data. I wanted something a bit messier than the inbuilt data, and something amenable to 2d visualisation. The code is to generate a set of 4-dimensional Gaussians, and then map them on to two dimensions. My thought was to take a data set that is linearly separable in its original dimensionality, distort it down, then see how easily SVM can restore the separation.
from pylab import *
from PyML import *
from PyML.demo import demo2d
def gauss_data( N, mu ):
p = mu[0].shape[0]
N = N + N%2 #make N divisible by two
X = []
Y = []
for i in range(N):
#select a random class
class_index = randint(0,2)
#create a new X point -> a p-dimensional Gaussian with mean of that class
X.append( randn(p) + mu[class_index] )
#create the new Y point -> 1 if from mu[1], -1 if from mu[0]
Y.append( str(class_index) )
return X, Y
f2 = lambda x: array([ sin(x[0] + x[1] ), cos( x[2] + x[3] ) ])
def transform( X, f ):
return [f(x) for x in X]
def transformed_gauss( N, f, mu):
X, Y = gauss_data( N, mu )
Z = transform( X, f )
D = VectorDataSet( Z, L=Y )
return D
My other tip on PyML is that it likes to contstruct DataSet instances with the label list as strings.


















