Local Entropy Inversion in Large-Scale AI Systems: Landauer Bounds on Algorithmic Compression

Boris Kriger

doi:10.59973/ipil.335

Authors

Boris Kriger Information Physics Institute, Gosport, Hampshire, United Kingdom, www.informationphysicsinstitute.org

DOI:

https://doi.org/10.59973/ipil.335

Keywords:

Information thermodynamics, Landauer's principle, Large Language Models, Algorithmic compression, Minimum description length, Thermodynamic efficiency

Abstract

We apply Landauer's principle to the training of large language models (LLMs), framing the process as a physically irreversible compression of high-entropy data distributions into low-entropy structured representations stored in model weights. This yields a lower bound on the minimum energy required for AI training, expressed in terms of the information-theoretic compression achieved. Empirical analysis of contemporary AI systems—GPT-3, PaLM, and LLaMA-2—reveals that current implementations operate approximately 10²¹ times above this Landauer limit. We introduce a demon efficiency metric to quantify this gap and examine how it varies across systems and baseline assumptions. We discuss an instructive analogy between LLM training and Maxwell's demon that provides physical intuition for the entropy-reducing character of the training process. We present a sensitivity analysis showing that while the absolute value of the efficiency metric depends on the choice of entropy baseline, the order-of-magnitude gap to the Landauer limit is robust across reasonable choices. These results provide a physical perspective on the energy requirements of artificial intelligence, though we emphasise that the Landauer bound is a direct consequence of well-established thermodynamic principles rather than a new theoretical result.

References

[1] James Clerk Maxwell, Theory of Heat, Longmans, Green, and Co., London (1871)

[2] Leo Szilard, ¨Uber die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen, Zeitschrift f¨ur Physik, Vol. 53, No. 11–12, pp. 840–856 (1929)

[3] Rolf Landauer, Irreversibility and heat generation in the computing process, IBM Journal of Research and Development, Vol. 5, No. 3, pp. 183–191 (1961)

[4] Charles H. Bennett, The thermodynamics of computation—a review, International Journal of Theoretical Physics, Vol. 21, No. 12, pp. 905–940 (1982)

[5] Antoine B´erut, Artak Arakelyan, Artyom Petrosyan, Sergio Ciliberto, Raoul Dillenschneider, and Eric Lutz, Experimental verification of Landauer’s principle linking information and thermodynamics, Nature, Vol. 483, pp. 187–189 (2012)

[6] Melvin M. Vopson, The mass-energy-information equivalence principle, AIP Advances, Vol. 9, No. 9, 095206 (2019)

[7] David Patterson, Joseph Gonzalez, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David So, Maud Texier, and Jeff Dean, Carbon emissions and large neural network training, arXiv preprint arXiv:2104.10350 (2021)

[8] Jorma Rissanen, Modeling by shortest data description, Automatica, Vol. 14, No. 5, pp. 465–471 (1978)

[9] Peter D. Gr ¨unwald, The Minimum Description Length Principle, MIT Press (2007)

[10] Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher, Pointer sentinel mixture models, arXiv preprint

arXiv:1609.07843 (2016)

[11] Michael P. Frank, The physical limits of computing, Computing in Science and Engineering, Vol. 4, No. 3, pp. 16–26 (2002)

[12] Tom Brown, Benjamin Mann, Nick Ryder, et al., Language models are few-shot learners, Advances in Neural Information Processing Systems, Vol. 33, pp. 1877–1901 (2020)

[13] Hugo Touvron, Louis Martin, Kevin Stone, et al., LLaMA 2: Open foundation and fine-tuned chat models, arXiv preprint

arXiv:2307.09288 (2023)

[14] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, et al., PaLM: Scaling language modeling with pathways, arXiv preprint arXiv:2204.02311 (2022)

[15] Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, et al., Training compute-optimal large language models, arXiv preprint arXiv:2203.15556 (2022)

[16] Peter Lennie, The cost of cortical computation, Current Biology, Vol. 13, No. 6, pp. 493–497 (2003)

[17] Simon Laughlin, Rob de Ruyter van Steveninck, and John Anderson, The metabolic cost of neural information, Nature Neuroscience, Vol. 1, No. 1, pp. 36–41 (1998)

[18] Jared Kaplan, Sam McCandlish, Tom Henighan, et al., Scaling laws for neural language models, arXiv preprint arXiv:2001.08361 (2020)

[19] Claude E. Shannon, Prediction and entropy of printed English, Bell System Technical Journal, Vol. 30, No. 1, pp. 50–64 (1951)

[20] Melvin M. Vopson and Serban Lepadatu, Second law of information dynamics, AIP Advances, Vol. 12, No. 7, 075310 (2022)

[21] Steve B. Furber, Francesco Galluppi, Steve Temple, and Luis A. Plana, The SpiNNaker project, Proceedings of the IEEE, Vol. 102, No. 5, pp. 652–665 (2014)

[22] Jeremy A. Owen, Artemy Kolchinsky, and David H. Wolpert, The fundamental thermodynamic costs of communication, arXiv preprint arXiv:2302.04320 (2023)