Exercise 6.15: Bernstein's Inequality for Vectors
(a)
Without loss of generality, assume that $\sigma=1$. It holds that $x_{ij} \in \SG(1)$, so $x_{ij}^2$ is sub-exponential with parameters $(2, 4)$. Hence $\sum_{i=1}^n x_{ij}^2$ is sub-exponential with parameters $(2\sqrt{n}, 4)$, which means that $\frac{1}{n}\sum_{i=1}^n x_{ij}^2$ is sub-exponential with parameters $(2/\sqrt{n}, 4/n)$. The sub-exponential tail bound then implies that \begin{equation} \P(\abs{\hat{D}_{ii} - D_{ii}} \ge t) \le 2 e^{-\frac{n}{8}\min(t, t^2)}. \end{equation} Therefore, by a union bound, \begin{equation} \P(\norm{\hat{D} - D}_2 \ge t) \le 2 d e^{-\frac{n}{8}\min(t, t^2)}. \end{equation} Consider $\e \in [0, 1]$. Then \begin{equation} \P(\norm{\hat{D} - D}_2 \ge \e + t) \le 2 d e^{-\frac{n}{8}\min(t, t^2) - \frac n8 \e^2}. \end{equation} Hence, set $\e = \sqrt{8\log(d)/n}$ to obtain \begin{equation} \P(\norm{\hat{D} - D}_2 \ge \sqrt{8\log(d)/n} + t) \le 2 e^{-\frac{n}{8}\min(t, t^2)}, \end{equation} which is valid for $n \ge 8 \log(d)$. For $n < 8 \log(d)$, we similarly have \begin{equation} \P(\norm{\hat{D} - D}_2 \ge 8\log(d)/n + t) \le 2 e^{-\frac{n}{8}\min(t, t^2)}. \end{equation} Combining the two, for all $n \ge 1$, \begin{equation} \P(\norm{\hat{D} - D}_2 \ge \min(8\log(d)/n, \sqrt{8\log(d)/n}) + t) \le 2 e^{-\frac{n}{8}\min(t, t^2)}. \end{equation} Intuitively, up to $n = 8 \log(d)$, $\norm{\hat D - D}_2$ decays $O(n^{-1})$; afterwards, it decays $O(1/\sqrt{n})$. Hence, a sample size of $n = 8 \log(d)$ is economical.
(b)
The assumption allows us to use Markov’s and Rosenthal’s inequality, upper bounding the maximum by a sum. To wit, \begin{align} \P(\norm{\hat D - D}_2 \ge 4 \delta \sqrt{\frac{d^{2/m}}{n}}) &\le \frac{1}{(2 \delta)^m}\frac{n^{m/2}}{d 2^m}\E[\max_{j\in [d]}\,\abs{\hat D_{jj} - D_{jj}}^m] \newline &\le \frac{1}{(2 \delta)^m} \frac{n^{m/2}}{2^m}\norm{\hat D_{11} - D_{11}}_m^m \end{align} and \begin{equation} \norm{\hat D_{11} - D_{11}}_m^m \le \frac{C^m_m}{n^m} \parens{\sbrac{\sum_{i=1}^n \E[(x_{i1}^2 - D_{11})^2]}^{1/2} + \sbrac{\sum_{i=1}^n \E[\abs{x_{i1}^2 - D_{11}}^m]}^{1/m}}^m, \end{equation} so \begin{equation} \norm{\hat D_{11} - D_{11}}_m^m \le \frac{C^m_m}{n^m} (\sqrt{nK_2} + (nK_m)^{1/m})^m \le C^m_m \frac{2^m}{n^{m/2}} \max(K_2,K_m). \end{equation} Putting it together, we have \begin{equation} \P(\norm{\hat D - D}_2 \ge 4 \delta \sqrt{\frac{d^{2/m}}{n}}) \le \frac{C^m_m \max(K_2, K_m)}{(2 \delta)^m}. \end{equation}