HDS

Exercise 13.10: Orthogonal Series Expansions

chapter 13

Consider \(\F(1 ; T) = \lbrace f = \sum_{m=1}^T \beta_m \phi_m \colon \|\beta\|_2 \leq 1 \rbrace\) where \(\{ \phi_m \}_m\) is an orthonormal basis for \(L^2(P)\).

(a)

If \(\{ \phi_m \}_m\) was orthogonal w.r.t. the empirical distribution \(P_n\) instead of the population distribution \(P\), then \begin{align} \argmin_{\norm{\beta} \leq 1} \frac{1}{n} \norm{y - \Phi \beta}_2^2 = \frac{1}{n} \Phi^\top y \, [1 \wedge (\norm{\Phi^\top y}_2 / n)]^{-1} \, . \end{align} In comparison, the ridge estimate would be \(\frac{1}{n}\Phi^\top y [1 + \lambda]^{-1}\), so setting \(\lambda = [1 \wedge (\norm{\Phi^\top y}_2 / n)] - 1\) yields the result.

In the general case, i.e., \(\{ \phi_m \}_m\) not orthogonal w.r.t. \(P_n\), we can introduce a Lagrange multiplier \begin{align} \nabla \left\lbrace \norm{y - \Phi \beta}_2^2 + \lambda (\norm{\beta}_2^2 - 1) \right\rbrace = 2 (\lambda I + \Phi^\top \Phi) \beta - 2 \Phi^\top y \, , \end{align} where we need either \(\lambda = 0\) or \(\norm{\beta}_2^2 = 1\) by KKT. The result is thus a ridge estimator with \(\lambda \geq 0\) chosen according to KKT.

(b)

Since \(f_{\theta^\star} = \sum_m \theta^\star \phi_m\) and \(f_{\theta} = \sum_m \theta_m \phi_m\) \begin{align} \|f_{\theta^\star} - f_{\theta}\|_2^2 &\overset{\text{(i)}}{=} \biggl\|\sum_{m=1}^T (\theta_m^\star - \theta_m) \phi_m\biggr\|_2^2 + \biggl\|\sum_{m=T+1}^\infty \theta_m^\star \phi_m\biggr\|_2^2 \newline &\overset{\text{(ii)}}{=} \norm{\theta_{1:T}^\star - \theta_{1:T}}_2^2 + \norm{\theta_{> T}^\star}_2^2 \, , \end{align} where (i) is by orthogonality, and (ii) by Parseval.

Published on 26 August 2021.