Exercise 3.15: Bounds for Suprema of Non-Negative Functions
Let \((V_i)_{i=1}^n\) be an i.i.d. sequence of random variables and \(\F\) a class of functions taking values in \([0, 1]\). Consider
\begin{equation} Z = \sup_{f \in \F} \sum_{i=1}^n f(V_i). \end{equation}
Without loss of generality, assume that \(\F = \set{f^1, \ldots, f^m}.\) Define
\begin{equation} X_i = (f^1(V_i), \ldots, f^m(V_i)) \end{equation}
and consider the function
\begin{equation} Z(X) = \max_{j \in [m]} \sum_{i=1}^n X^j_i. \end{equation}
(a)
Let \(Y_k(X)\) be equal to \(X\) except with the \(k^{\text{th}}\). This eliminates one term in the sum of \(Z\), hence can only decrease it, so
\begin{equation} Z(X) \ge Z(Y_k(X)) \iff Z(X) - Z(Y_k(X)) \ge 0. \end{equation}
(b)
By tensorisation of the entropy, we have
\begin{equation} \Hb(e^{\lambda Z_k(X)}) \le \E\sbrac{ \sum_{i=1}^n \Hb(e^{\lambda Z_i(X)} \cond X_{-i}) } \end{equation}
where \(Z_i(X)\) is the function \(X_i \to Z(X)\) with all other elements fixed. Therefore, by the variational representation of the entropy,
\begin{equation} \Hb(e^{\lambda Z_k(X)}) \le \E\sbrac{ \sum_{i=1}^n \E[\psi(\lambda(Z(X) - Z(Y_i(X)))) e^{\lambda Z(X)} \cond X_{-i}] } \end{equation}
where \(\psi(u) = e^{-u} - 1 + u\).
(c)
Let \(\Ab_l\) be the collection of realisations of \(X\) that achieves the maximum at index \(l\). Without loss of generality, assume \((\Ab_l)_{l=1}^m\) to be disjoint. Then, if \(X \in \Ab_l\),
\begin{equation} Z(X) - Z(Y_k(X)) = \sum_{i=1}^n X^l_i - Z(Y_k(X)) \le X^l_k \end{equation}
by lower bounding the maximum of \(Z(Y_k(X))\) by index \(l\). Therefore, multiply by \(\lambda \ge 0\) and sum against \(\ind(X \in \Ab_l)\) to find
\begin{equation} 0 \le \lambda (Z(X) - Z(Y_k(X))) \le \lambda \sum_{l = 1}^m \ind (X \in \Ab_l) X^l_k. \end{equation}
(d)
For \(\alpha \le 1\), observe that that \(\psi(\alpha x) \le \alpha \psi(x)\). Hence, using that
\begin{equation} \sum_{l = 1}^m \ind (X \in \Ab_l) X^l_k \le 1 \end{equation}
because every function in \(\F\) takes values in \([0, 1]\), we obtain
\begin{equation} 0 \le \psi(\lambda(Z(X) - Z(Y_k(X)))) \le \psi(\lambda) \sum_{l = 1}^m \ind (X \in \Ab_l) X^l_k, \end{equation}
also using that \(\psi\) is increasing.
(e)
Combining the previous parts, we find
\begin{equation} \Hb(e^{\lambda Z_k(X)}) \le \psi(\lambda) \E\sbrac{ \sum_{l = 1}^m\ind (X \in \Ab_l) \sum_{i=1}^nX^l_i e^{\lambda Z(X)} } = \psi(\lambda) \E[Z(X) e^{\lambda Z(X)}], \end{equation}
by definition of \(\Ab_l\).
(f)
Part (e) gives that
\begin{equation} \lambda \varphi’(\lambda) - \varphi(\lambda) \log \varphi(\lambda) \le \psi(\lambda) \varphi’(\lambda). \end{equation}
Divide by \(\varphi(\lambda) > 0\):
\begin{equation} \lambda (\log \varphi)’(\lambda) - \log \varphi(\lambda) \le \psi(\lambda) (\log \varphi)’(\lambda). \end{equation}
Hence, rearranging,
\begin{equation} (\log \varphi)’(\lambda) \le \frac{1}{\lambda - \psi(\lambda)} \log \varphi(\lambda) = \frac{e^{\lambda}}{e^{\lambda} - 1} \log \varphi(\lambda), \end{equation}
using that \(\lambda - \psi(\lambda) > 0\). Because not \(Z(X) = 0\) a.s., it holds that \(\log \varphi(\lambda) > 0\), so this differential inequality can be rewritten as
\begin{equation} (\log \log \varphi)’(\lambda) \le (\log(\exp - 1))’(\lambda). \end{equation}
Therefore,
\begin{equation} \parens{\log \frac{\log \varphi}{\exp - 1}}’(\lambda) \le 0, \end{equation}
and integrating gives, for \(\lambda > \lambda_0 > 0\),
\begin{equation} \log \frac{\log \varphi(\lambda)}{e^{\lambda} - 1} \le \log \frac{\log \varphi(\lambda_0)}{e^{\lambda_0} - 1} \to \log \varphi’(0) = \log \E[Z(X)], \end{equation}
by taking \(\lambda_0 \downarrow 0\) and using L’Hopital’s rule. We conclude that
\begin{equation} \log \varphi(\lambda) \le (e^{\lambda} - 1) \E[Z(X)] \end{equation}
for all \(\lambda \ge 0\).