Exercise 13.9: Rates for Additive Nonparametric Models

chapter 13 Lipschitz Gaussian localisation

(a)

Denote \begin{equation} Z_t = \sup_{g \in \G,\,\norm{g}_n \le t}\, \abs{\frac1n \sum_{i=1}^n w_i g(x_i)}. \end{equation} To begin with, fix some $\hat \Delta_j$ and derive a concentration inequality for \begin{equation} \frac\sigma n\abs{\sum_{i=1}^n w_i \hat \Delta_j(x_i)}. \end{equation} Simply taking the supremum over all such $\Delta_j$ is too loose. We therefore use localisation, which is a clever case-by-case analysis that allows for a supremum over all $\Delta_j$ with bounded $n$-norm. If $\norm{\Delta_j}_n \le t$, then \begin{equation} \frac\sigma n\abs{\sum_{i=1}^n w_i \hat \Delta_j(x_i)} \le \sigma Z_t. \end{equation} On the other hand, if $\norm{\Delta_j}_n \ge t$, then, using the star-shaped property of $\G$, \begin{equation} \frac\sigma n\abs{\sum_{i=1}^n w_i \hat \Delta_j(x_i)} \le \sigma \frac{\norm{\hat \Delta_j}_n}{t}Z_t. \end{equation}

Let us start out with the second case. Note that $\sigma Z_t/t$ is a Lipschitz function of standard Gaussian random variables with Lipschitz constant $\sigma/\sqrt{n}$. Therefore \begin{equation} \sigma \frac{Z_t}{t} \le \sigma \frac{\G_n(t; \G)}{t} + u \end{equation} with probability at least $1 - \exp(-\frac{n u^2}{2 \sigma^2})$. Denote $\delta_n = \delta_{n,\text{max}}$. Require that $t \ge \delta_n$. Then \begin{equation} \sigma \frac{\G_n(t; \G)}{t} \le \sigma \frac{\G_n(\delta_{n,j}; \G)}{\delta_{n,j}} \le \tfrac12 \delta_{n,j} \le \tfrac12 \delta_{n}. \end{equation} Hence, \begin{equation} \sigma \frac{Z_t}{t} \le \tfrac12 \delta_{n} + u \end{equation} with probability at least $1 - \exp(-\frac{n u^2}{2 \sigma^2})$. Set $u = \tfrac12t$, so $\sigma Z_t / t \le t$: \begin{equation} \frac\sigma n\abs{\sum_{i=1}^n w_i \hat \Delta_j(x_i)} \le t \norm{\hat \Delta_j}_n \end{equation} with probability at least $1 - \exp(-\frac{n t^2}{8 \sigma^2})$. For the first case, the same concentration inequality gives $\sigma Z_t \le t^2$: \begin{equation} \frac\sigma n\abs{\sum_{i=1}^n w_i \hat \Delta_j(x_i)} \le t^2 \end{equation} with probability at least $1 - \exp(-\frac{n t^2}{8 \sigma^2})$. Putting together the parts, by the triangle inequality, for all $t \ge \delta_n$, \begin{equation} \frac\sigma n\abs{\sum_{i=1}^n w_i \hat \Delta(x_i)} \le d t^2 + t \sum_{j=1}^d \norm{\hat \Delta_j}_n \end{equation} with probability at least $1 - d \exp(-\frac{n t^2}{8 \sigma^2})$. Replacing $t \gets \sqrt{t \delta_n}$ gives the result.

(b)

By (a), $d \ge 1$, and the assumption (noting $K \ge 1$), \begin{equation} \frac12 \norm{\hat\Delta}_n^2 \le \frac\sigma n\abs{\sum_{i=1}^n w_i \hat \Delta(x_i)} \le d K \delta^2_n + \sqrt{d K} \delta_n \norm{\hat \Delta}_n \end{equation} with probability at least $1 - d \exp(-\frac{n \delta^2_n}{8 \sigma^2})$. If $\norm{\hat\Delta}_n^2 \lesssim d K \delta^2_n$, then the result holds. If $\norm{\hat\Delta}_n^2 \gtrsim d K \delta^2_n$, then \begin{equation} \norm{\hat\Delta}_n^2 \lesssim \sqrt{d K} \delta_n \norm{\hat\Delta}_n + \sqrt{d K} \delta_n \norm{\hat \Delta}_n \implies \norm{\hat\Delta}_n \lesssim \sqrt{d K} \delta_n, \end{equation} so the result also holds.

Published on 9 April 2021.