Exercise 15.16: Lower Bounds for Sparse PCA in $\ell^2$-Error
Let $\Theta_s = \lbrace \theta \in S^{d-1} \colon | \theta | = s \rbrace$, and apply Fano’s inequality to obtain \begin{align} \inf_{\hat{\theta}} \sup_{\Theta_s} \E \| \hat{\theta} - \theta \|_2^2 \geq \delta^2 \biggl[ 1 - \frac{I(Z, J) + \log 2}{\log M} \biggr] \, , \end{align} for a $2\delta$-packing of $\Theta_s$, where $J \sim \Unif[M]$, and $Z \given J = j \sim P_{\theta_j}$.
By Gilbert-Varshamov, we can take $\lbrace m_1 , \ldots , m_M \rbrace$ to be an $\bigl(\tfrac{s - 1}{2}\bigr)$-packing of $\lbrace m \in \lbrace -1, 0 , 1 \rbrace^{d-1} \colon \| m \|_0 = s - 1 \rbrace$ w.r.t. the Hamming distance, and define \begin{align} \theta_j(\Pi) = \sqrt{1 - \delta^2} \begin{pmatrix} 1 \newline 0_{d - 1} \end{pmatrix} + \delta \begin{pmatrix} 0 \newline \Pi \, m_j / \sqrt{s - 1} \end{pmatrix} \, , \end{align} where $\Pi$ is random matrix following the uniform distribution over the space of permutation matrices. By construction, $\| \theta_j (\Pi) - \theta_k (\Pi) \|_2 = \tfrac{\delta}{\sqrt{s - 1}} \| m_j - m_k \|_2 \geq \tfrac{\delta}{\sqrt{2}}$, and $\| \theta_j(\Pi) \|_2^2 = 1$, and $\| \theta_j (\Pi) \|_0 = s$.
We will apply Lemma 15.17. To do so, we observe $\det \V (X \given J = j , \Pi) = 1 + \nu \| \theta_j(\Pi) \|_2^2 = 1 + \nu$, and \begin{align} \V(X) = \E \bigl[ \V (X \given J = j , \Pi) \bigr] = I + \nu \, \E \begin{pmatrix} 1 - \delta^2 & 0 \newline 0 & \tfrac{\delta^2}{s - 1} \Pi \, m_j m_j^\top \Pi^\top \end{pmatrix} \, , \end{align} for which \begin{align} \det \V (X) &= [1 + \nu (1 - \delta^2)] \det \bigl\lbrace I_{d - 1} + \tfrac{\nu \delta^2}{d-1} \bigl[ I_{d - 1} + \tfrac{s-2}{d-2} (1 1^\top - I_{d - 1}) \bigr] \bigr\rbrace \newline &\leq [1 + \nu (1 - \delta^2)] \bigl[ 1 + \tfrac{\nu \delta^2}{d - 1} \bigr]^{d-2} \bigl[ 1 + (s - 1) \tfrac{\nu \delta^2}{d - 1} ] \, . \end{align} Because the last factor is one for $s = 1$, and $\log (1 + cx) \leq c \log (1 + x)$ for $x \geq 0, c \geq 1$ by the mean value theorem, we can apply Lemma 15.17 as promised \begin{align} I(Z , J) \lesssim n \bigl[ \log (1 + \nu (1 -\delta^2)) + (d - 1) \log \bigl(1 + \tfrac{\nu \delta^2}{d - 1}\bigr) \bigr] \lesssim n \frac{\nu^2 \delta^2}{1 + \nu} \, . \end{align} From Gilbert-Varshamov, we know $\log M \gtrsim s \log \tfrac{e d}{s}$. Hence taking $\delta^2 \asymp n^{-1} \tfrac{\nu + 1}{\nu^2} s \log \tfrac{ed}{s}$ gives the desired \begin{align} \inf_{\hat{\theta}} \sup_{\Theta_s} \E \| \hat{\theta} - \theta \|_2^2 \gtrsim n^{-1} \frac{\nu + 1}{\nu^2} s \log \frac{e d}{s} \, . \end{align}