Exercise 2.14: Concentration Around Medians and Means

chapter 2 Chernoff bound incomplete layer cake symmetrisation

(a)

By the layer cake trick \begin{align} \V (X) = \int_0^\infty \P ((X - \mu)^2 > t) \, \sd t \leq c_1 \int_0^\infty e^{- c_2 t} \, \sd t = \frac{c_1}{c_2} \, . \end{align}

(b)

Take \(X\) to be a Rademacher random variable. Then any \(m_X \in [-1 , 1]\) is a median.

(c)

We will proceed by proving the bound for the case of (i) large \(t \geq \alpha \Delta\), and (ii) small \(t < \alpha \Delta\), where \(\Delta = | \E [X] - m_X |\), and \(\alpha > 0\) is an auxiliary parameter which we will choose later.

(i) \(t \geq \alpha \Delta\): Using \(| X - m_X | \leq \Delta + | X - E [X] |\) \begin{align} \P (|X - m_X| \geq t) &= \P \left(|X - m_X| \geq \tfrac{t}{\alpha} + (1 - \tfrac{1}{\alpha})t\right) \newline &\leq \P \left( | X - m_X | \geq \Delta + (1 - \tfrac{1}{\alpha}) t \right) \newline &\leq \P \left( | X - \E [X] | \geq (1 - \tfrac{1}{\alpha}) t \right) \leq c_1 e^{-c_2 (1 - \alpha^{-1})^2 t^2} \, . \end{align}

(ii) \(t < \alpha \Delta\): Noticing the bound from (i) grows as \(t \downarrow 0\), the strategy we take is to inflate said bound sufficiently so that it holds vacuously for all \(0 \leq t < \alpha \Delta\). Introducing another auxiliary parameter \(\beta \geq 1\), we therefore require \begin{align} \beta c_1 e^{-c_2 (1 - \alpha^{-1})^2 t^2} > \beta c_1 e^{-c_2 (\alpha - 1)^2 \Delta^2} \geq 1 \geq \P (|X - m_X| \geq t) \, . \end{align} The crucial observation here is that the assumed bounds \begin{align} \P (X \geq \E [X] + t) &\leq c_1 e^{- c_2 t^2 } \, , \newline \P (X \leq \E [X] - t) &\leq c_1 e^{- c_2 t^2 } \, , \end{align} imply that for either \(\E [X] + t\) or \(\E [X] - t\) to be a median (i.e., satisfy \(\P (X \geq m_X) \geq \tfrac{1}{2}\) and \(\P (X \leq m_X) \geq \tfrac{1}{2}\) respectively), we must have \(c_1 e^{- c_2 t^2 } \geq \frac{1}{2}\). Hence \begin{align} m_X \in \left[ \E[X] - \sqrt{\tfrac{1}{c_2}\log(2 c_1)} , \E[X] + \sqrt{\tfrac{1}{c_2}\log(2 c_1)} \right] \, , \end{align} which provides a bound for \(\Delta\). Substituting into the above expression \begin{align} \beta c_1 e^{- c_2 (\alpha - 1)^2 \Delta^2} \geq \beta c_1 e^{- (\alpha - 1)^2 \log 2 c_1} = \beta c_1 \left( \tfrac{1}{2 c_1} \right)^{(\alpha - 1)^2} \geq 1 \, . \end{align} Since we have no control over \(c_1\), we must set \(\alpha = 2\) (\(\alpha = 0\) is inadmissable because we have divided by \(\alpha\) above). This then also necessitates setting \(\beta = 2\).

Putting (i) and (ii) together \begin{align} \P (|X - m_X| \geq t) \leq c_3 e^{- c_4 t^2} \, , \end{align} for \(c_3 = 2 c_1\) and \(c_4 = \tfrac{c_2}{4}\) which improves upon each of the required constants by a factor of two.

(c) Alternative Solution

Suppose that we have concentration around some value \(a\): \begin{equation} \P(\abs{X - a} \ge t) \le c_1 e^{-c_2 t^2}. \end{equation} We aim for concentration around another value \(b\). Let \(\Delta = \abs{a - b}\) be the difference between the two values. The key insight is that a far enough deviation from \(b\) implies a deviation from \(a\): \begin{equation} \P(\abs{X - b} \ge t) \le \P(\abs{X - a} \ge \tfrac12 t) \le c_1 e^{-\frac14 c_2 t^2} = f(t) \end{equation} for \(t \ge 2 \Delta\). For \(t \le 2 \Delta\), we have the trivial bound \begin{equation} \P(\abs{X - b} \ge t) \le 1. \end{equation} To unify these two bounds into one, we require that \begin{equation} \inf_{t \in [0, 2 \Delta]} f(t) = f(2 \Delta) = c_1 e^{-c_2 \Delta^2} \ge 1. \end{equation} Then, for all \(t \ge 0\), we have \begin{equation} \P(\abs{X - b} \ge t) \le \parens{1 \lor \frac{1}{c_1} e^{c_2 \Delta^2}} c_1 e^{-\frac14 c_2 t^2} \le \parens{c_1 \lor e^{c_2 \Delta^2}} e^{-\frac14 c_2 t^2}. \end{equation}

Let \(b\) be a median. We can develop an upper bound on \(\Delta\). Note that \begin{equation} \P(X \ge a + t) \le c_1 e^{-c_2 t^2}. \end{equation} If the r.h.s. is strictly less than \(\frac12\), then \(\P(X \ge a + t) < \tfrac12\), so \(a + t\) cannot be a median. In particular, it must be greater than all medians, so \begin{equation} a + t \ge b \implies b - a \le t. \end{equation} Similarly, note that \begin{equation} \P(X \le a - t) \le c_1 e^{-c_2 t^2}. \end{equation} Again, if the r.h.s. is strictly less than \(\frac12\), then \(\P(X \le a - t) < \tfrac12\), so \(a - t\) cannot be a median. In particular, it must be smaller than all medians, so \begin{equation} a - t \le b \implies a - b \le t. \end{equation} In conclusion, \(\Delta = \abs{a - b} \le t\) for all \(t\) such that \begin{equation} c_1 e^{-c_2 t^2} = \alpha \implies t = \sqrt{\log(c_1/\alpha)/c_2} \end{equation} with \(\alpha \in (0, \frac12)\). Then take \(\alpha \uparrow \frac12\) to find that \begin{equation} \Delta \le \sqrt{\log(2 c_1)/c_2}. \end{equation} With this upper bound, we have \begin{equation} \P(\abs{X - b} \ge t) \le 2 c_1 e^{-\frac14 c_2 t^2}. \end{equation}

Alternatively, a lower bound on \(f(2 \Delta)\) can be obtained. Note that \begin{equation} f(2 \Delta) \ge \P(\abs{X - a} \ge \Delta). \end{equation} If \(a \le b\), then \begin{equation} \P(X \ge b) \le \P(\abs{X - a} \ge \Delta). \end{equation} If \(a \ge b\), then \begin{equation} \P(X \le b) \le \P(\abs{X - a} \ge \Delta). \end{equation} Hence, \begin{equation} f(2 \Delta) \ge \P(X \ge b) \land \P(X \le b). \end{equation} Therefore, \begin{equation} \P(\abs{X - b} \ge t) \le \frac{c_1}{\P(X \ge b) \land \P(X \le b)} e^{-\frac14 c_2 t^2}. \end{equation} If \(b\) is a median, then \begin{equation} \P(\abs{X - b} \ge t) \le 2 c_1 e^{-\frac14 c_2 t^2}. \end{equation} More generally, if \(b\) is an \(\alpha\)-quantile with \(\alpha \in (0, \tfrac12]\), then \begin{equation} \P(\abs{X - b} \ge t) \le \frac{c_1}{\alpha} e^{-\frac14 c_2 t^2}. \end{equation} Note that the results hold for any \(a\).

(d)

Let us start off by combining the Chernoff bound with the inequality \(e^{\lambda |x|} \leq e^{\lambda x} + e^{-\lambda x}\) \begin{align} \P (| X - \E [X] | \geq t) &\leq \inf_{\lambda > 0} \frac{ \E [e^{\lambda (X - \E[X])}] + \E [e^{-\lambda (X - \E[X])}] }{e^{\lambda t}} \, . \end{align} Introducing \(Y\), an independent copy of \(X\), we use the Jensen’s inequality to bound \begin{align} \E_X [e^{\lambda (X - \E_Y[Y])}] \leq \E_{X, Y} [e^{\lambda (X - Y)}] \, . \end{align} Assuming integrability (which we prove momentarily), we can expand \begin{align}\label{eq:d_mgf_bound} \E [e^{\lambda (X - Y)}] = 1 + \sum_{n = 1}^{\infty} \frac{\lambda^n}{n!} \E [(X - Y)^n] \, . \end{align} Observing \(X - Y\) is symmetric, it is clear that every finite odd moment vanishes. This then allows restricting our attention to bounding the even moments, which can be done using the layer cake trick \begin{align} \E [|X - Y|^n] &= \int_0^\infty \P (|X - Y|^n \geq u) \, \sd u \newline &\overset{\text{(i)}}{\leq} n \int_0^\infty v^{n - 1} \P (|X - Y| \geq v) \, \sd v \newline &\overset{\text{(ii)}}{\leq} 2n \int_0^\infty v^{n - 1} \P (|X - m_X| \geq \tfrac{v}{2}) \, \sd v \, , \end{align} where (i) follows by the change of variable \(u = v^n\), and (ii) uses the triangle inequality \(| X - Y | \leq | X - m_x | + | Y - m_X |\) in combination with \(X\) and \(Y\) being i.i.d. \begin{align} \P (| X - Y | < t) \geq \P ( | X - m_X | < \tfrac{t}{2} ) \P ( | Y - m_X | < \tfrac{t}{2} ) \, , \end{align} which after taking complements and rearranging yields \begin{align}\label{eq:d_symmetrised_deviation_bound} \P (| X - Y | \geq t) \leq 2 \P (|X - m_X| \geq \tfrac{t}{2}) - \P (|X - m_X| \geq \tfrac{t}{2})^2 \leq 2 \P (|X - m_X| \geq \tfrac{t}{2}) \, . \end{align}

Substituting the assumed \(P(|X - m_X| \geq t) \leq c_3 e^{-c_4 t^2}\) \begin{align} \E [|X - Y|^n] &\leq 2n c_3 \int_0^\infty v^{n - 1} e^{- c_4 \frac{v^2}{4}} \frac{ \sqrt{2\pi \cdot 2 / c_4} }{ \sqrt{2\pi \cdot 2 / c_4} } \, \sd v \newline &= 2 n c_3 \sqrt{\frac{\pi}{c_4}} \int_{-\infty}^{\infty} |v|^{n - 1} \frac{ e^{- \frac{c_4}{4} v^2} }{ \sqrt{2\pi \cdot 2 / c_4} } \, \sd v \newline &= c_3 n \bigl( \tfrac{2}{\sqrt{c_4}} \bigr)^{n} \Gamma \bigl( \tfrac{n}{2} \bigr) \, , \end{align} where the last equality uses the closed-form expression for absolute moment of a centred Gaussian variable \(Z \sim \gauss (0, \tfrac{2}{c_4})\) \begin{align} \E [|Z|^{n - 1}] = \biggl( \frac{2}{c_4} \biggr)^{\frac{n - 1}{2}} \frac{ 2^{\frac{n - 1}{2}} \Gamma\bigl(\tfrac{n}{2}\bigr) }{\sqrt{\pi}} \end{align} with \(\Gamma\) standing for the Gamma function. Substituting the bound for the even moments into Equation \eqref{eq:d_mgf_bound} \begin{align} \E [e^{\lambda(X - Y)}] &= 1 + \sum_{k = 1}^{\infty} \frac{\lambda^{2k}}{(2k)!} \E [ | X - Y |^{2k} ] \newline &\overset{\text{(i)}}{\leq} 1 + c_3 \sum_{k = 1}^{\infty} \biggl( \frac{2\lambda}{\sqrt{c_4}} \biggr)^{2k} \frac{\Gamma(k)}{\Gamma (2k)} \newline &\overset{\text{(ii)}}{\leq} 1 + c_3 \sum_{k = 1}^\infty \biggl( \frac{4 \lambda^2}{c_4} \biggr)^k \frac{1}{k!} \newline &\overset{\text{(iii)}}{\leq} c_3 e^{\frac{4 \lambda^2}{c_4}} \, , \end{align} where (i) uses the identity \(\Gamma (2k) = (2k - 1)!\), (ii) follows by the same identity and \(\frac{1}{(2k - 1) \cdots k} \leq \frac{1}{k!}\), and (iii) exploits \(c_3 \geq 1\): since the bound \(P(|X - m_X| \geq t) \leq c_3 e^{-c_4 t^2}\) holds for all \(t \geq 0\), taking \(t \downarrow 0\) reveals \(c_3 \geq 1\).

Because the same argument can be applied to bound \(\E [e^{-\lambda (X - \E [X])}]\), we can substitute into our Chernoff bound \begin{align} \P (|X - \E [X]| \geq t) &\leq 2 c_3 \inf_{\lambda > 0} e^{\frac{4 \lambda^2}{c_4} - \lambda t} = c_1 e^{- c_2 t^2} \, , \end{align} with \(c_1 = 2 c_3\) and \(c_2 = \frac{c_4}{8}\).

This bound is valid but does not match the required \(c_2\) constant!

(d) Alternative Solution Using the Layer Cake Trick Directly

Applying the Chernoff bound, for any \(\lambda > 0\) \begin{align} \P ( | X - \E [X] | \geq t ) = \P \bigl( e^{\lambda (X - \E [X])^2} \geq e^{\lambda t^2} \bigr) \leq e^{- \lambda t^2} \E \bigl[e^{\lambda (X - \E [X])^2}\bigr] \, . \end{align} Since we are trying to establish \(X - \E [X]\) behaves like a sub-Gaussian random variable, and Theorem 2.6(IV) tells that sub-Gaussianity implies \(\E [ e^{\lambda (X - \E[X])^2} ] < \infty\), our goal will be to obtain a similar bound here.

Introducing \(Y\), an independent copy of \(X\), the Jensen’s inequality can be applied to the convex function \(x \mapsto e^{\lambda x^2}\) to obtain the symmetrised \begin{align} \E_X \bigl[e^{\lambda (X - \E_Y [Y])^2}\bigr] \leq \E_{X, Y} \bigl[e^{\lambda (X - Y)^2}\bigr] \, . \end{align} The right-hand side can be bounded using the layer cake trick \begin{align} \E \bigl[ e^{\lambda (X - Y)^2} \bigr] &= \int_0^\infty \P \bigl( e^{\lambda (X - Y)^2} \geq u \bigr) \, \sd u \newline &\overset{\text{(i)}}{\leq} \alpha + \int_{\alpha}^\infty \P \bigl( e^{\lambda (X - Y)^2} \geq u \bigr) \, \sd u \newline &\overset{\text{(ii)}}{=} \alpha + \lambda \int_{\frac{\log \alpha}{\lambda}}^\infty e^{\lambda v} \P (|X - Y| \geq \sqrt{v}) \, \sd v \, , \end{align} where (i) introduces a parameter \(\alpha \geq 1\) which we will optimise later, and (ii) follows by substituting \(u = e^{\lambda v}\). Using the bound from Equation \eqref{eq:d_symmetrised_deviation_bound} to bound the second term \begin{align} \lambda \int_{\frac{\log \alpha}{\lambda}}^\infty e^{\lambda v} \P (|X - Y| \geq \sqrt{v}) \, \sd v &\leq 2 \lambda \int_{\frac{\log \alpha}{\lambda}}^\infty e^{\lambda v} \P (|X - m_x| \geq \tfrac{\sqrt{v}}{2}) \, \sd v \newline &\overset{\text{(i)}}{\leq} 2 \lambda c_3 \int_{\frac{\log \alpha}{\lambda}}^\infty e^{(\lambda - \frac{c_4}{4}) v} \, \sd v \newline &= 2 \lambda c_3 \biggl[ \frac{e^{(\lambda - \frac{c_4}{4}) v}}{\lambda - \frac{c_4}{4}} \biggr]_{\frac{\log \alpha}{\lambda}}^\infty \overset{\text{(ii)}}{=} \frac{8 \lambda c_3}{c_4 - 4\lambda} \alpha^{-\left(\frac{c_4}{4 \lambda} - 1 \right)} \end{align} where (i) is by substitution of the assumed concentration bound around the median, and (ii) holds whenever \(\lambda < \frac{c_4}{4}\). Substituting into our bound for \(\E [e^{\lambda (X - Y)^2}]\) \begin{align} \E \bigl[e^{\lambda (X - Y)^2}\bigr] &\leq \alpha \left( 1 + c \alpha^{-p} \right) \, , \end{align} where we defined \(c = \frac{8 \lambda c_3}{c_4 - 4\lambda}\) and \(p = \frac{c_4}{4 \lambda}\).

We are now in position to optimise over \(\alpha \geq 1\). Setting the derivative to zero, the unconstrained optimum is \(\alpha = [c (p - 1)]^{1 / p} = (2 c_3)^{1 / p}\). Since \(c_3 \geq 1\) (see above), \(\alpha = (2 c_3)^{1 / p}\) is also the constrained optimum. Substituting \begin{align} \E \bigl[e^{\lambda (X - Y)^2}\bigr] &\leq [ c (p - 1) ]^{1/p} + c = (2 c_3)^{\frac{4 \lambda}{c_4}} + \frac{8 \lambda c_3}{c_4 - 4\lambda} \, . \end{align}

The only remaining task is to optimise over \(\lambda < \frac{c_4}{4}\) \begin{align} \P (|X - \E [X]| \geq t) \leq e^{-\lambda t^2} \left[ (2 c_3)^{\frac{4 \lambda}{c_4}} + \frac{8 \lambda c_3}{c_4 - 4\lambda} \right] \, . \end{align} Making \(\lambda\) too small would yield the vacuous unit bound, while approaching \(\frac{c_4}{4}\) from below makes the second term explode. We thus somewhat arbitrarily take the midpoint \(\lambda = \frac{c_4}{8}\) which results in \begin{align} \P (|X - \E [X]| \geq t) \leq c_1 e^{- c_2 t^2 } \, , \end{align} with \(c_1 = \sqrt{2 c_3} (1 + \sqrt{2 c_3})\) and \(c_2 = \frac{c_4}{8}\).

The above bound is valid but does not match the required constants!

Notes

There is nothing special about the median in (d). In fact, we can obtain a concentration bound for the mean similar to (d), whenever there exists \(a \in \R\) such that \(\P ( |X - a| \geq t ) \leq c_3 e^{-c_4 t^2}\).

Published on 25 August 2020.