Physics Colloquium: The Physics of Learnable Data
Noam Levi, Laboratory for Information and Inference Systems and Laboratory of Astrophysics, EPFL Lausanne
Zoom: https://tau-ac-il.zoom.us/j/89569736497?pwd=P24faNj8FaBGnCwSUqGuaAvdiA1b7a.1
Abstract:
The power of physics lies in its ability to use simple models to predict the behavior of highly complex systems — allowing us to ignore microscopic details or, conversely, to explain macroscopic phenomena through minimal constituents. In this talk, I will explore how these physical principles of universality and reductionism extend beyond the natural universe to the space of generative models and natural data.
I will begin by discussing major open problems in modern machine learning where a physics perspective is particularly impactful. Focusing on the role of data in the learning process, I will first examine the "Gaussian" approximation of real-world datasets, which is widely used in theoretical calculations. I will then argue that truly understanding generative models (such as diffusion and language models) requires characterizing the non-trivial latent structure of their training data, shifting the problem from networks to data.
I will present a simple yet predictive hierarchical generative model of data, and demonstrate how this hierarchical structure can be probed using diffusion models and observables drawn from statistical physics. Finally, I will discuss future prospects, connecting hierarchical compositionality to semantic structures in natural language and looking beyond the diffusion paradigm.
Event Organizer: Dr. Tobias Holder

