The Art of Transformer Programming

The Esoteric Programming Language at the Heart of Modern AI

Yaniv Leviathan

Book cover
Current Edition: 2022.10.24 (PREPRINT)
Last Updated: 2024.07.28

HOW can modern large language models perform simple computations, like sorting a list, searching for a sequence, or adding two numbers in decimal representation?

Several years and thousands of papers since the Transformer was invented, it is still hard to find a satisfying answer to this seemingly simple question.

The Transformer is a highly efficient differentiable computer. When equipped with the right set of weights, obtained through a lengthy optimization process on supercomputers processing massive amounts of data, Transformer models are able to recall vast amounts of memorized knowledge and perform complex computations. Unlike human-designed programs on traditional computers, the inner workings of such models after training are not well understood.

In this work we put aside the ability of a Transformer to be efficiently optimized and instead focus just on the Transformer as a programmable computer. We will choose a set of basic programs, including sorting, searching, and addition, and implement all of them by hand on a Transformer computer. I.e., we will manually set the weights of a non-simplified production-grade decoder-only Transformer, similar to that powering modern LLMs, to provably perform exactly the desired computations, without a training procedure or datasets. The book includes dozens of fun puzzles about this esoteric programming language at the heart of modern AI.

Citation

@misc{Leviathan2022taotp, title={The Art of Transformer Programming}, url={https://yanivle.github.io/taotp.html}, journal={Yaniv Leviathan’s Blog}, author={Leviathan, Yaniv}, year={2022} }