Consider a random variable with mean , and such that,
for some scalars almost surely.
(a)
To begin with, .
For , we know that the derivative of the MGF equals , so
(b)
The identity for follows from the chain rule.
For the upper bound, observe that we can define a new distribution
by taking
to be its Radon–Nikodym derivative (density) with respect to the distribution
of .
Hence establishing a bound on is equivalent to bounding the
supremum over variances of random variables .
Taking , using that the mean minimises the mean squared error,
and using that a.s. for all ,
(c)
Taking a Taylor expansion of at
,
for some .
Substituting the results from (a) and (b),
as desired.