On the Difference between the Information Bottleneck and the Deep Information Bottleneck

oleh: Aleksander Wieczorek, Volker Roth

Format: Article
Diterbitkan: MDPI AG 2020-01-01

Deskripsi

Combining the information bottleneck model with deep learning by replacing mutual information terms with deep neural nets has proven successful in areas ranging from generative modelling to interpreting deep neural networks. In this paper, we revisit the deep variational information bottleneck and the assumptions needed for its derivation. The two assumed properties of the data, <i>X</i> and <i>Y</i>, and their latent representation <i>T</i>, take the form of two Markov chains <inline-formula> <math display="inline"> <semantics> <mrow> <mi>T</mi> <mo>&#8722;</mo> <mi>X</mi> <mo>&#8722;</mo> <mi>Y</mi> </mrow> </semantics> </math> </inline-formula> and <inline-formula> <math display="inline"> <semantics> <mrow> <mi>X</mi> <mo>&#8722;</mo> <mi>T</mi> <mo>&#8722;</mo> <mi>Y</mi> </mrow> </semantics> </math> </inline-formula>. Requiring both to hold during the optimisation process can be limiting for the set of potential joint distributions <inline-formula> <math display="inline"> <semantics> <mrow> <mi>P</mi> <mo>(</mo> <mi>X</mi> <mo>,</mo> <mi>Y</mi> <mo>,</mo> <mi>T</mi> <mo>)</mo> </mrow> </semantics> </math> </inline-formula>. We, therefore, show how to circumvent this limitation by optimising a lower bound for the mutual information between <i>T</i> and <i>Y</i>: <inline-formula> <math display="inline"> <semantics> <mrow> <mi>I</mi> <mo>(</mo> <mi>T</mi> <mo>;</mo> <mi>Y</mi> <mo>)</mo> </mrow> </semantics> </math> </inline-formula>, for which only the latter Markov chain has to be satisfied. The mutual information <inline-formula> <math display="inline"> <semantics> <mrow> <mi>I</mi> <mo>(</mo> <mi>T</mi> <mo>;</mo> <mi>Y</mi> <mo>)</mo> </mrow> </semantics> </math> </inline-formula> can be split into two non-negative parts. The first part is the lower bound for <inline-formula> <math display="inline"> <semantics> <mrow> <mi>I</mi> <mo>(</mo> <mi>T</mi> <mo>;</mo> <mi>Y</mi> <mo>)</mo> </mrow> </semantics> </math> </inline-formula>, which is optimised in deep variational information bottleneck (DVIB) and cognate models in practice. The second part consists of two terms that measure how much the former requirement <inline-formula> <math display="inline"> <semantics> <mrow> <mi>T</mi> <mo>&#8722;</mo> <mi>X</mi> <mo>&#8722;</mo> <mi>Y</mi> </mrow> </semantics> </math> </inline-formula> is violated. Finally, we propose interpreting the family of information bottleneck models as directed graphical models, and show that in this framework, the original and deep information bottlenecks are special cases of a fundamental IB model.