Notice, as per property of the [[Kullback–Leibler divergence]], that <math>I(X;Y)</math> is equal to zero precisely when the joint distribution coincides with the product of the marginals, i.e. when <math>X</math> and <math>Y</math> are independent (and hence observing <math>Y</math> tells you nothing about <math>X</math>). In general <math>I(X;Y)</math> is non-negative, it is a measure of the price for encoding <math>(X,Y)</math> as a pair of independent random variables, when in reality they are not. | Notice, as per property of the [[Kullback–Leibler divergence]], that <math>I(X;Y)</math> is equal to zero precisely when the joint distribution coincides with the product of the marginals, i.e. when <math>X</math> and <math>Y</math> are independent (and hence observing <math>Y</math> tells you nothing about <math>X</math>). In general <math>I(X;Y)</math> is non-negative, it is a measure of the price for encoding <math>(X,Y)</math> as a pair of independent random variables, when in reality they are not. |