Yaniv Leviathan

Hacking until the machine learns, or I do
Yaniv Leviathan

I am a Google Fellow at Google where I lead an AI-research lab, focusing on foundations, new model architectures, efficiency, generative UI, new data sources, and image, video, and game generation. My work has made Google's LLMs faster and cheaper.

Google Duplex

I started and led the Google Duplex project, the world's first AI system that could conduct human-like voice conversations in domains like making reservations. Today, Duplex is calling businesses around the world and is a major contributor to Google’s updated business data, like opening hours. It operates at high scale; in May 2023, we announced that business data from Duplex’s phone calls was shown to users over 1 trillion times.

Google Duplex Announcement (May 2018)
Duplex booking a meal for me (2018):

Speculative Decoding

I developed Speculative Decoding, a technique that speeds up generation from autoregressive models by computing several tokens in parallel without a quality trade-off; in fact, the method guarantees an identical output distribution. Producing results faster with the same hardware means fewer machines and less energy are needed.

Speculative decoding became a standard across the industry. We introduced it in "Fast Inference from Transformers via Speculative Decoding" (2022), I gave a talk about it at ICML in 2023, and we published a blog post in 2024.

Image, Video, and Game Gen

I worked on early methods for Image (UniTune, 2022) and Video (Dreamix, 2023) editing, both based on the idea that a generator can be converted into an editor by fine-tuning it on the specific input. I also worked on Face0 (2023), allowing personalized image generation without optimization.

More recently, I ran DOOM on a neural model in real-time (GameNGen, 2024).

The Art of Transformer Programming

I programmed a set of basic programs, like searching, sorting, and addition, on a transformer by hand - manually setting the weights of a GPT-2-like transformer-decoder.

If you've ever wondered how your favorite LLM might be adding numbers or sorting a list, you might like my book where I documented my solutions: The Art of Transformer Programming (2022). Based on observations from the book, I developed a simple modification of the attention module that improves transformers (Selective Attention, 2024).

Previous Life

I have been doing AI research for a long while. In 1998, as a high school student, I joined a startup building video understanding software where I programmed neural networks in plain C. I also founded and served as CEO of Tailor, where we built a system that automatically modifies websites based on usage patterns.

A couple of decades ago I spent several intensive years working on cryptography and cybersecurity. I also spent a few years working on gaming and computer graphics. I studied pure math at Tel-Aviv University.

I have a blog which features my collection of math and hacking puzzles, as well as write-ups of weekend projects. Try puzzles like Rabbit Season, Really Equal? Naturally!, Expanding Map, or (×2, +1) Equivalence.