Exercise 3.8: Exponential Families and Entropy
(a)
Note that the ratio of entropy and the MGF \(\varphi := \varphi_X\) can be written as \begin{align} \frac{\mathbb{H}(e^{\lambda X})}{\varphi (\lambda)} &= \lambda \frac{\varphi’(\lambda)}{\varphi(\lambda)} - \log \varphi (\lambda) \newline &= \lambda (\log \varphi)’ (\lambda) - \log \varphi(\lambda) \newline &= \int_0^\lambda (\log \varphi)’ (\lambda) - (\log \varphi)’ (t) \, \sd t \, . \end{align} Hence the required inequality holds if \((\log \varphi)'\) is \(L\)-Lipschitz \begin{align} \frac{\mathbb{H}(e^{\lambda X})}{\varphi (\lambda)} \leq L \int_0^\lambda |\lambda - t| \, \sd t = \frac{\lambda^2 L}{2} \, . \end{align}
To establish the Lipschitzness, we rewrite the MGF as \begin{align} \varphi(\lambda) = \E [e^{\lambda \langle v , T(Y) \rangle}] = \int h(y) \exp \bigl\lbrace \langle \theta + \lambda v , T(y) \rangle - \Phi(\theta) \bigr\rbrace \, \sd \mu(y) = e^{\Phi(\theta + \lambda v) - \Phi(\theta)} \, , \end{align} which then implies \begin{align} (\log \varphi)’ (\lambda) = \langle \nabla \Phi(\theta + \lambda v) , v \rangle \, . \end{align} Therefore, using the assumed \(\| v \|_2 = 1\) and the Cauchy-Schwarz inequality \begin{align} \bigl| (\log \varphi)’ (\lambda) - (\log \varphi)’ (t) \bigr| &\leq \| \nabla \Phi(\theta + \lambda v) - \nabla \Phi(\theta + t v) \|_2 \newline &\leq L \| \lambda v - t v \| = L | \lambda - t | \, , \end{align} which establishes the required \(L\)-Lipschitzness of \((\log \varphi)'\). \(X = \langle v , T(Y) \rangle\) is thus sub-Gaussian with parameter \(\sqrt{L}\) by the Herbst argument (Proposition 3.2).
(b)
(i): Univariate Gaussian
The natural parameter is given by \(\theta = \mu / \sigma^2\), and \begin{align} \Phi(\theta) = \frac{\mu^2}{2 \sigma^2} = \frac{\sigma^2}{2} \theta^2 \, . \end{align} Clearly, \(\nabla \Phi\) is \(\sigma^2\)-Lipschitz.
(ii): Bernoulli
The probability mass function is characterised by \begin{align} y \log p + (1 - y) \log (1 - p) = y \, \underbrace{\log \frac{p}{1 - p}}_{= \theta} - \underbrace{\log \frac{1}{1 - p}}_{= \Phi(\theta)} \, . \end{align} Hence \(p = \frac{1}{1 + e^{-\theta}}\), \(\Phi (\theta) = \log (1 + e^{\theta})\), and thus \(\nabla \Phi(\theta) = \frac{1}{1 + e^{-\theta}}\). Observing \(|\nabla^2 \Phi| \leq \frac{1}{4}\), we conclude \(\nabla \Phi\) is \(\frac{1}{4}\)-Lipschitz.