HDS

Exercise 15.18: Lower Bounds for Additive Nonparametric Regression

chapter 15

(a)

We use the Yang–Baron version of Fano’s method. This time, we do not keep track of the constants. First, we need a lemma about packing $[N]^d$, which is a simplified version of Lemma 4(a) by Raskutti, Wainright, and Yu (2012).

Lemma. Let $d \in \N$ be even. Let $M$ be the $d/2$-packing number of $G = [N]^d$ for some $N \in \N$ with respect to the Hamming distance $\rho\ss{H}$. Then \begin{equation} \log M \ge \frac{d}{64} \log N \end{equation} if $N \ge 6$.

Proof. The cardinality of a $d/2$-Hamming ball centred at some $u \in G$ is at most $\binom{d}{d/2} N^{d/2}$: fix $d - d/2$ out of $d$ coordinates and arbitrarily vary the other $d/2$ coordinates over the $N$ values. Therefore, if $M \binom{d}{d/2} N^{d/2} < |G| = N^d$, then we can find a $v \in G$ which is at Hamming distance $d/2$ from all elements in the packing; and we can keep doing this until $M \binom{d}{d/2} N^{d/2} \ge N^d$: \begin{equation} M \ge \frac{1}{\binom{d}{d/2}}N^{\frac{d}{2}} \overset{\text{(i)}}{\ge} \parens{\frac{1}{2e}}^{d/2} N^{\frac{d}{2}} \end{equation} using in (i) the upper bound $\binom{n}{k} \le (\frac{en}{k})^k$. Therefore, \begin{equation} \log M \ge \frac{d}{2} \log\frac{N}{2e} \ge \frac{d}{64} \log N \end{equation} if $N \ge 6$. \(\blacksquare\)

Every $f \in \F$ can be written as $f(x) = \sum_{j=1}^d g_j(x_j)$ where, for all $j \in [d]$, $g_j \in \G$. Moreover, assuming that the elements $g_j$ are centred and the measure over the inputs is a product measure, \begin{equation} \norm{f}^2_2 = \sum_{j=1}^d \norm{g_j}_2^2. \end{equation} From this decomposition of $\norm{f}_2^2$, we deduce that \begin{equation} \log N(\e, \F, \norm{\vardot}_2) \le d \log N(\e / \sqrt{d}, \G, \norm{\vardot}_2). \end{equation} We similarly lower bound the packing number. For this, we use the lemma to construct elements in $f$ which all differ for at least $d/2$ functions in $\G$: \begin{equation} \log M(2 \delta, \F, \norm{\vardot}_2) \gtrsim d \log M(2 \delta / \sqrt{d / 2}, \G, \norm{\vardot}_2) \end{equation} if $M \ge 6$, which corresponds to a sufficient small choice of $\delta > 0$. We will later see that $\delta^2 \propto d (\sigma / n)^{2 \alpha / (2 \alpha + 1)}$, which puts a universal requirement on $(d, \sigma, n)$, e.g. $n$ sufficiently large based on $(d, \sigma)$.

Step A of The Yang–Baron Version of Fano’s Method

Let $\P^n_{f}$ be the joint distribution over $(x_i, y_i)_{i=1}^n$ for a fixed function $f \in \F$. Then \begin{equation} \KL(\P_{f_1}^n, \P^n_{f_2}) = \frac{n}{2 \sigma^2}\norm{f_1 - f_2}_2^2. \end{equation} Therefore, \begin{align} \log N(\e, \F, \sqrt{\KL}) &=\log N(2 \sqrt{\sigma} \e / \sqrt{n}, \F, \norm{\vardot}_2) \newline &\le d\log N(2 \sqrt{\sigma} \e / \sqrt{n d}, \G, \norm{\vardot}_2) \newline &\propto d \parens{ \frac{\sqrt{n d}}{\sqrt{\sigma} \e} }^{\frac1\alpha} \overset{!}{\lesssim} \e^2. \end{align} The required inequality is equivalent to \begin{equation} d^{1 + \frac1{2\alpha}} \parens{\frac{n}{\sigma}}^{\frac1{2\alpha}} \lesssim \e^{2 + \frac1 \alpha} \iff d \parens{\frac{n}{\sigma}}^{\frac{1}{2\alpha + 1}} \lesssim \e^2, \end{equation} so set $\e^2$ proportional to the l.h.s. of the right inequality. We require that $\e^2$ is bigger than some universal constant, which again puts a universal requirement on $(d, \sigma, n)$, again e.g. $n$ sufficiently large based on $(d, \sigma)$.

Step B of The Yang–Baron Version of Fano’s Method

We have \begin{align} \log M(2 \delta, \F, \norm{\vardot}_2) &\gtrsim d \log M(2 \delta / \sqrt{d / 2}, \G, \norm{\vardot}_2) \newline &\propto d \parens{ \frac{\sqrt{d}}{\delta} }^{\frac1\alpha} = d \parens{ \frac{d}{\delta^2} }^{\frac1{2\alpha}} \overset{!}{\lesssim} \e^2 \propto d \parens{\frac{n}{\sigma}}^{\frac{1}{2\alpha + 1}}, \end{align} so the choice $\delta^2 \propto d (\sigma / n)^{2 \alpha / (2 \alpha + 1)}$ suffices.

Conclusion of The Yang–Baron Version of Fano’s Method

From steps A and B and Fano’s inequality, we conclude that, for $n$ sufficiently large based on $(d, \sigma)$, we have \begin{equation} \inf_{\hat f} \sup_{f \in \F\ss{add}} \E[\norm{\hat f - f}_2^2] \gtrsim d \parens{\frac{\sigma}{n}}^{\frac{2 \alpha}{2 \alpha + 1}}. \end{equation}

(b)

The first term follows immediately from (a). For the second term, we apply Fano’s method. Again, we need a lemma, which again is a simplified version of Lemma 4(a) by Raskutti, Wainright, and Yu (2012).

Fix some $g \in \G$. For $u \in \set{-1, 0, 1}^d$ with $\norm{u}_0 \le s$, denote \begin{equation} f_u(x) = \sum_{j=1}^d u_j g(x_j). \end{equation} By the lemma, there exists a collection of $M$ such $f_u$ with \begin{equation} \norm{f_{u_1} - f_{u_2}}_2^2 \ge \frac{s}{2} \norm{g}_2^2 \end{equation} for $u_1 \neq u_2$ and \begin{equation} \log M \gtrsim d \log \frac{ed}{s} \end{equation} if $s \le d / 8$.

Note that \begin{equation} \KL(\P_{f_{u_1}}^n, \P_{f_{u_2}}^n) \le \frac{2 n s}{\sigma^2} \norm{g}_2^2. \end{equation} As long as $d$ is sufficiently (large based on $s$), we have $\log M \ge 4 \log 2$. Choose $g \in \G$ such that $\frac{s}{2}\norm{g}_2^2 = \delta^2$. Then \begin{equation} \KL(\P_{f_{u_1}}^n, \P_{f_{u_2}}^n) \le \frac{4n}{\sigma^2} \delta^2. \end{equation} To find that $\log M \gtrsim \frac{n}{\sigma^2} \delta^2$, choose \begin{equation} \delta^2 \propto \sigma^2 \frac{d}{n}\log \frac{ed}{s}. \end{equation} Therefore, by Fano’s inequality, as long as $d$ is sufficiently large (based on $s$), \begin{equation} \inf_{\hat f} \sup_{f \in \F\ss{spam}} \E[\norm{\hat f - f}_2^2] \gtrsim \sigma^2 \frac{d}{n}\log \frac{ed}{s}. \end{equation}

Published on 2 February 2022.