HDS

Exercise 12.20: Support Vector Machines and Kernel Methods

chapter 12

(a)

Since the linear span of {k(,xi)}i=1n is finite dimensional, it is closed. Thus we can decompose any fH into two orthogonal components f=f+f where f lies in the linear span, and therefore (1)f(xi)=f,k(,xi)=f,k(,xi), for all i[n]. Hence f only affects the regulariser fH2=fH2+fH2 which is minimised when f=0.

(b)

Let ξ=[ξ1,,ξn]Rn be an auxiliary variable which upper bounds the observation-wise hinge losses: ξi0 and ξi1yif(xi) for all i[n]. With this constraint and f=i=1nαik(,xi) for some αRn by (a) (2)minf{1ni=1nmax{0,1yif(xi)}+λn2fH2}=minα,ξ{1λnni=1nξi+12αKα}. where Kij=k(xi,xj). Defining η=1/(λnn), and 1=[1,,1]Rn, we can introduce the dual variables μ,νRn, μi0 and νi0 for all i, and write the Lagrangian (3)L(α,ξ,μ,ν)=ηξ1+12αKα+μξν(ξ1)(νy)Kα(4)=ν1+ξ(η1+μν)+12αKα(νy)Kα, where denotes the Hadamard product. Setting the derivatives w.r.t. the primal variables to zero (5)Lα=KαK(νy)=0,Lξ=η1+μν=0, and substituting into the Lagrangian, we obtain (6)L(ν)=i=1nνi12νK~ν, where K~ij=yiyjk(xi,xj). We can thus eliminate the μ variables by introducing the inequality constrain νiη, for all i, which comes from the above η1+μν=0 combined with μi0 (definition). Since νi0 (also by definition), we maximise over νi[0,1/(λnn)]. This is equivalent to the desired result.

Published on 19 June 2021.