Exercise 3.12: Rademacher Chaos Variables
In both parts of this exercise, we assume either that none of the matrices is zero, or that the deviation sizes \(\delta\) and \(t\) are strictly larger than zero and \(e^{- \delta^2 / 0} = 0\).
(a)
Define \(f(\varepsilon) = \| Q^{1/2} \varepsilon \|_2\) so that \(X = f(\varepsilon)^2\). By the reverse triangle inequality \begin{align} | f(\varepsilon) - f(\varepsilon’) | \leq \| Q^{1/2} \|_2 \| \varepsilon - \varepsilon’ \|_2 \, , \end{align} for any two \(\varepsilon, \varepsilon'\). Since \(f\) is also separately convex by the standard triangle inequality, we can use that the components of \(\varepsilon\) are i.i.d. Rademacher by assumption, and apply Theorem 3.4 to obtain \begin{align} \P \left(f(\varepsilon) \geq \E [f(\varepsilon)] + t\right) \leq \exp \left\lbrace - \frac{t^2}{16 \| Q \|_2} \right\rbrace \, , \end{align} where \(\| Q^{1/2} \|_2^2 = \| Q \|_2\). Since \begin{align} \E \left[\| Q^{1/2} \varepsilon \|_2\right] = \E \left[\sqrt{\varepsilon^\top Q \varepsilon}\right] \leq \sqrt{\tr(Q)} \, , \end{align} by the Jensen’s inequality, the above implies \begin{align} \P \left[X \geq \left(\sqrt{\tr(Q)} + t\right)^2\right] \leq \exp \left\lbrace - \frac{t^2}{16 \| Q \|_2} \right\rbrace \, , \end{align} as desired (a pre-factor of two was saved compared to the book because \(Q\) is positive semidefinite).
(b)
Defining \(f^\varepsilon (\varepsilon') = \langle \varepsilon', M \varepsilon \rangle\), the conditional MGF satisfies \begin{align} \E \left[ e^{\lambda f^\varepsilon (\varepsilon’)} \given \varepsilon \right] \leq \exp \left\lbrace \sum_{i=1}^d \frac{\lambda^2 [M \varepsilon]_i^2}{2} \right\rbrace = \exp \left\lbrace \frac{\lambda^2 \| M \varepsilon \|_2^2}{2} \right\rbrace \, , \end{align} where the inequality follows by sub-Gaussianity of Rademacher variables. An application of the Chernoff bound implies \begin{align} \P \left( Y \geq \delta \given \varepsilon \right) \leq \exp \left\lbrace - \frac{\delta^2}{2 \| M \varepsilon \|_2^2} \right\rbrace \, . \end{align} To control \(X = \varepsilon^\top M^2 \varepsilon\) (\(M \in \S^{d \times d}\) by assumption), we can equate \(Q = M^2\) and apply (a) to obtain \begin{align} \P \left[ \| M \varepsilon \|_2^2 \geq \left( \| M \|_F + t \right)^2 \right] \leq \exp \left\lbrace - \frac{t^2}{16 \| M \|_2^2} \right\rbrace \, . \end{align} Since \((\|M\|_F + t)^2 \leq 2 \| M \|_F^2 + 2 t^2\), choosing \(t^2 = 8 \delta \|M\|_2\), and substituting yields \begin{align} \P (Y \geq \delta) &\leq \exp \left\lbrace - \frac{\delta^2}{4 \| M \|_F^2 + 16 \| M \|_2} \right\rbrace + \exp \left\lbrace - \frac{\delta}{2 \| M \|_2} \right\rbrace \newline &\leq 2 \exp \left\lbrace - \frac{\delta^2}{4 \| M \|_F^2 + 16 \| M \|_2} \right\rbrace \, . \end{align}