Exercise 3.14: Transportation Cost Inequality
We take as given the following result implied by the Kantorovich–Rubinstein duality: \begin{align}\label{eq:kr_dual} W_{\rho}(\mathbb{Q} , \mathbb{P}) = \sup_{\norm{f}\lip \leq 1} \int f \left( \sd \mathbb{Q}_{2 \given 1} \sd \mathbb{Q}_{1} - \sd \mathbb{P}_2 \sd \mathbb{P}_1 \right) \, , \end{align} where \(\norm{f}\lip\) is defined w.r.t. the metric \(\rho\). We also make the usual assumption that there exists a point \(x_0 \in \mathcal{X}^2\) such that \(\int \! \rho (x, x_0) \, \sd \mathbb{Q} (x)\) and \(\int \! \rho (x, x_0) \, \sd \mathbb{P} (x)\) are finite, implying any Lipschitz function w.r.t. \(\rho\) is integrable for \(\mathbb{Q}\) and \(\mathbb{P}\).
(a)
Invoking Equation \eqref{eq:kr_dual}, we can add and subtract \(\int f \, \sd \mathbb{P}_2 \sd \mathbb{Q}_1\) to obtain the result.
(b)
Recalling all Lipschitz functions are integrable by our assumption, we can combine the Tonelli-Fubini theorem with the dual representation of \(W_\rho (\mathbb{Q} , \mathbb{P})\) from (a) \begin{align} &\sup_{\norm{f}\lip \leq 1} \int \!\!\! \int f(x_1, x_2) (\sd \mathbb{Q}_{2 \given 1} - \sd \mathbb{P}_2)(x_2) \, \sd \mathbb{Q}_1 (x_1) + \int \!\!\! \int f(x_1, x_2) \, (\sd \mathbb{Q}_1 - \sd \mathbb{P}_1) (x_1) \, \sd \mathbb{P}_2(x_2) \newline &\leq \int \! \sup_{\norm{f}\lip \leq 1} \int \! f(x_1, x_2) (\sd \mathbb{Q}_{2 \given 1} - \sd \mathbb{P}_2)(x_2) \, \sd \mathbb{Q}_1 (x_1) + \int \! \sup_{\norm{f}\lip \leq 1} \int \! f(x_1, x_2) \, (\sd \mathbb{Q}_1 - \sd \mathbb{P}_1) (x_1) \, \sd \mathbb{P}_2(x_2) \newline &= \int \! W_{\rho_2} (\mathbb{Q} (\cdot \given x_1) , \mathbb{P}_2) \, \sd \mathbb{Q}_1(x_1) + W_{\rho_1} (\mathbb{Q}_1 , \mathbb{P}_1) \, , \end{align} where the last equality follows from the assumed \(\rho(x, x') = \rho(x_1, x_1') + \rho(x_2 , x_2')\). Applying the transportation cost inequality for \(P_1\) and \(P_2\) \begin{align} \int \! W_{\rho_2} (\mathbb{Q} (\cdot \given x_1) , \mathbb{P}_2) \, \sd \mathbb{Q}_1(x_1) + W_{\rho_1} (\mathbb{Q}_1 , \mathbb{P}_1) &\leq \int \! \sqrt{ 2 \gamma_2 D (\mathbb{Q} (\cdot \given x_1) , \mathbb{P}_2) } \, \sd \mathbb{Q}_1(x_1) + \sqrt{2 \gamma_1 D (\mathbb{Q}_1 , \mathbb{P}_1)} \, . \end{align}
(c)
Applying the result from (b) \begin{align} W_{\rho}(\mathbb{Q} , \mathbb{P}) &\leq \int \! \sqrt{ 2 \gamma_2 D (\mathbb{Q} (\cdot \given x_1) , \mathbb{P}_2) } + \sqrt{2 \gamma_1 D (\mathbb{Q}_1 , \mathbb{P}_1)} \, \sd \mathbb{Q}_1(x_1) \newline &\overset{\text{(i)}}{\leq} \int \! \sqrt{ 2(\gamma_1 + \gamma_2) [ D (\mathbb{Q}_1 , \mathbb{P}_1) + D (\mathbb{Q} (\cdot \given x_1) , \mathbb{P}_2) ] } \, \sd \mathbb{Q}_1(x_1) \newline &\overset{\text{(ii)}}{\leq} \sqrt{ 2(\gamma_1 + \gamma_2) \lbrace D (\mathbb{Q}_1 , \mathbb{P}_1) + \E_{\mathbb{Q}_1}[D (\mathbb{Q} (\cdot \given x_1) , \mathbb{P}_2)] \rbrace } \newline &\overset{\text{(iii)}}{=} \sqrt{ 2(\gamma_1 + \gamma_2) D (\mathbb{Q} , \mathbb{P}) } \, , \end{align} where (i) follows by Cauchy-Schwarz, (ii) by Jensen, and (iii) by the chain rule for Kullback-Leibler divergence.