Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Information Bottleneck Theory Based Exploration of Cascade Learning
oleh: Xin Du, Katayoun Farrahi, Mahesan Niranjan
Format: | Article |
---|---|
Diterbitkan: | MDPI AG 2021-10-01 |
Deskripsi
In solving challenging pattern recognition problems, deep neural networks have shown excellent performance by forming powerful mappings between inputs and targets, learning representations (features) and making subsequent predictions. A recent tool to help understand how representations are formed is based on observing the dynamics of learning on an information plane using mutual information, linking the input to the representation (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>I</mi><mo>(</mo><mi>X</mi><mo>;</mo><mi>T</mi><mo>)</mo></mrow></semantics></math></inline-formula>) and the representation to the target (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>I</mi><mo>(</mo><mi>T</mi><mo>;</mo><mi>Y</mi><mo>)</mo></mrow></semantics></math></inline-formula>). In this paper, we use an information theoretical approach to understand how Cascade Learning (CL), a method to train deep neural networks layer-by-layer, learns representations, as CL has shown comparable results while saving computation and memory costs. We observe that performance is not linked to information–compression, which differs from observation on End-to-End (E2E) learning. Additionally, CL can inherit information about targets, and gradually specialise extracted features layer-by-layer. We evaluate this effect by proposing an information transition ratio, <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>I</mi><mo>(</mo><mi>T</mi><mo>;</mo><mi>Y</mi><mo>)</mo><mo>/</mo><mi>I</mi><mo>(</mo><mi>X</mi><mo>;</mo><mi>T</mi><mo>)</mo></mrow></semantics></math></inline-formula>, and show that it can serve as a useful heuristic in setting the depth of a neural network that achieves satisfactory accuracy of classification.