Since the linear span of is finite
dimensional, it is closed. Thus we can decompose any into
two orthogonal components where
lies in the linear span, and therefore
for all . Hence only affects the regulariser
which is
minimised when .
(b)
Let be an auxiliary variable which
upper bounds the observation-wise hinge losses: and
for all . With this constraint and
for some by (a)
where . Defining , and
, we can introduce the dual variables
, and for all , and
write the Lagrangian
where denotes the Hadamard product. Setting the derivatives w.r.t. the
primal variables to zero
and substituting into the Lagrangian, we obtain
where . We can thus eliminate the
variables by introducing the inequality constrain ,
for all , which comes from the above
combined with (definition). Since (also by
definition), we maximise over . This is
equivalent to the desired result.