Exercise 15.17: Lower Bounds for Generalised Linear Models

chapter 15


Let $p_\theta(y_1, \ldots, y_n)$ the denote density of the generalised linear model. To begin with, note that $\Phi’(\lra{x_i, \theta}) = \E_\theta[Y_i]$. Therefore, \begin{equation} \KL(\P_{\theta_1}, \P_{\theta_2}) = \frac1{s(\sigma)}\sum_{i=1}^n\sbrac{ \Phi(\lra{x_i, \theta_2}) - \Phi’(\lra{x_i, \theta_1})(\lra{x_i, \theta_2} -\lra{x_i, \theta_1}) - \Phi(\lra{x_i, \theta_1}) } \end{equation} which is the difference between $\Phi$ and its linearisation around $\lra{x_i, \theta_1}$ evaluated at $\lra{x_i, \theta_2}$. [By convexity, this is always non-negative.]


By Taylor’s theorem, \begin{align} \KL(\P_{\theta_1}, \P_{\theta_2}) &\le \frac1{s(\sigma)}\sum_{i=1}^n\frac12 L (\lra{x_i, \theta_2} -\lra{x_i, \theta_1})^2 \newline &= \frac{L}{2 s(\sigma)} \norm{X(\theta_1 - \theta_2)}_2^2 \newline &\le \frac{n L \eta\ss{max}^2}{2 s(\sigma)} \norm{\theta_1 - \theta_2}_2^2 \newline &= n c^2 \norm{\theta_1 - \theta_2}_2^2 \end{align} with $c^2 = L \eta\ss{max}^2 / 2 s(\sigma)$.


We use Fano’s method. Firstly, we explicitly construct a local packing: consider $\set{\theta \in \R^d : \norm{\theta}_2 \le r}$ and let $\theta_1, \ldots, \theta_m$ be a $2 \delta$-packing. This is a $2\delta/r$-packing of the unit ball under the norm $\frac1r\norm{\vardot}_2$, so \begin{equation} \log M \ge d \log \frac{r}{2 \delta}. \end{equation} By Fano’s inequality with $\Phi(\delta) = \delta^2$, \begin{equation} \inf_{\hat\theta} \sup_{\theta \in \Bb_2^d(1)} \E[\norm{\hat\theta - \theta}_2^2] \ge \delta^2 \parens{ 1 - \frac{4 n c^2 \delta^2 + \log 2}{d \log \frac{r}{2 \delta}} }. \end{equation} Choose $r$ such that $d \log \frac{r}{2 \delta} \ge 4 \log 2$, e.g. $r = 4 \delta$ and $d \ge 4$. Then \begin{equation} \inf_{\hat\theta} \sup_{\theta \in \Bb_2^d(1)} \E[\norm{\hat\theta - \theta}_2^2] \ge \delta^2 \parens{ \frac12 - \frac{4 n c^2 \delta^2}{d \log 2} } \ge \frac14\delta^2 \end{equation} as long as $4nc^2\delta^2 \le \frac14 d \log 2 $, e.g. $\delta^2 = d \log 2/(16 n c^2)$. We conclude that \begin{equation} \inf_{\hat\theta} \sup_{\theta \in \Bb_2^d(1)} \E[\norm{\hat\theta - \theta}_2^2] \ge \frac1{4}\frac{d \log 2}{16 n} \frac{2 s(\sigma)}{L \eta\ss{max}^2} = \frac{1}{64} \frac{s(\sigma)}{L \eta\ss{max}^2} \frac{d}{n}. \end{equation}


Denoting $\eta\ss{min} = \sigma\ss{min}(X / \sqrt{n})$, we have $\frac1n\norm{X(\hat \theta - \theta)}_2^2 \ge \eta\ss{min}^2 \norm{\hat \theta - \theta}_2^2$, so \begin{equation} \inf_{\hat\theta} \sup_{\theta \in \Bb_2^d(1)} \E[\tfrac1n\norm{X(\hat\theta - \theta)}_2^2] \ge \frac{1}{64} \frac{\eta\ss{min}^2}{\eta\ss{max}^2}\frac{s(\sigma)}{L} \frac{d}{n}. \end{equation} The usual linear regression corresponds to the case where $s(\sigma) = \sigma^2$ and $h(z)$ is the density of $\Normal(0, \sigma^2)$.

Published on 9 September 2021.