Previous lecture concluded with deriving a dual problem(maximize W($\alpha$) w.r.t $\alpha$ within 2 constraints) for optimal margin classifier and noting that using SVM prediction depends only upon the inner product of input variable with those in the training sets.
This lecture introduces Kernel function, an inner product of feature mappings, i.e.
K(x,z) = $\phi(x)^T \phi(z)$ = $<\phi(x), \phi(z)>
where $\phi$ is called the feature mapping. Interestingly, K(x,z) can be very inexpensive to compute even when $\phi(x)$ is very expensive to compute(lecture presents some examples including Gaussian Kernel). For this reason, kernel trick that is if you have any algorithm that you can write only in terms of inner products
So far we assumed that our training data was linearly separable. To relax this constraint, to make the algorithm work for non-linearly separable datasets as well, we can reformulate(called l1 regularization) the dual problem for optimal margin classifier.
Then, another optimization algorithm(we've studied gradient ascent/descent and newton's method already) called coordinate ascent is introduced that can be used to maximize any function w.r.t to some input parameters without any constraints. Now, we use a slightly modified version of coordinate ascent to solve dual problem derived above. And, this new way is called the SMO(sequential minimal optimization) algorithm
No comments:
Post a Comment