Yaniv Leviathan

Yaniv Leviathan 🐳

Google Duplex · Speculative Decoding · Google Search Answers · GameNGen · The Art of Transformer Programming · Selective Attention

“What I cannot create, I do not understand.”
- Richard Feynman

I am a Distinguished Engineer at Google where I lead an AI-research lab, focusing on foundations, new model architectures, new data sources, and image, video, and game generation. My work made Google's LLMs substantially faster and cheaper.

Some highlights from my published work:

  • I led the creation of the world's first AI system that carries out human-like voice conversations at high scale (Google Duplex, 2018).
  • I invented a technique for decoding more efficiently from LLMs, that is now standard across the industry (Speculative Decoding, 2022).
  • I ran DOOM on a neural model in real-time (GameNGen, 2024).
  • I manually set the weights of a transformer to perform basic algorithms like searching, sorting, and addition (The Art of Transformer Programming book, 2022).
  • I developed a simple modification of the attention module that improves transformers (Selective Attention, 2024).
  • I developed techniques for neural image and video editing (UniTune, 2022, Dreamix, 2023, and Face0, 2023).
  • I worked on Google Search.
  • I maintain this site, with a collection of my favorite puzzles, and some blog posts.

Google Duplex

I started and led the Google Duplex project, the world's first AI system that could conduct human-like voice conversations, in domains like making reservations. Today, Duplex is calling businesses around the world and is a major contributor to Google’s updated business data, like opening hours. It is operating at high scale, and for example, in May 2023, we announced that business data from Duplex’s phone calls was shown to users over 1 trillion times.

Google Duplex Announcement (May 2018):
Duplex booking a meal for me in 2018:

Speculative Decoding

I developed Speculative Decoding, a technique that speeds up generation from autoregressive models by computing several tokens in parallel, without a quality trade-off; in fact, the method guarantees an identical output distribution. Producing results faster with the same hardware also means that fewer machines and less energy are needed for serving the same amount of traffic. Speculative decoding is widely used across the industry. We introduced speculative decoding in "Fast Inference from Transformers via Speculative Decoding" in 2022, I gave a talk about it at ICML in 2023, and we published a blog post about it in 2024.


Image, Video, and Game Gen

I worked on early methods for Image (UniTune, 2022) and Video (Dreamix, 2023) editing, both based on the simple idea that a generator can be converted into an editor by fine-tuning it on the image/video to edit. I also worked on one of the first techniques that allowed generating personalized images (e.g. an astronaut that looks like you) without performing any optimizations (like training), which means that it runs really fast (Face0, 2023). More recently, I ran DOOM on a neural model in real-time (GameNGen, 2024).

TAOTP
The Art of Transformer Programming

I programmed a set of basic programs, like searching, sorting, and addition, on a transformer by hand - i.e. I manually set the weights of a gpt-2-like transformer-decoder to perform each of these algorithms. If you've ever wondered how your favorite LLM might be adding numbers together, sorting a list, etc. you might like my book where I documented my solutions to the above - The Art of Transformer Programming (2022). Based on some observations and exercises from the book, I developed a simple modification of the attention module that improves transformers (Selective Attention, 2024).

I started and led the team that built the AI system that reads and understands the open web and automatically populates Google's Knowledge Graph. Thanks to this system, Google Search shows answers, cards, and lists on a large fraction of user queries like [homer simpson], [world war 2], and [nba teams]. I also led the team that built the backend of Google Trends.

Other

I have been doing AI research for a long while. In 1998, as a high school student, I joined a startup building video understanding software where I programmed some neural networks (in plain C!). I've also founded and was the CEO of a company called Tailor where we built a system that automatically modifies websites based on users' usage patterns. A couple of decades ago I've spent several intensive years working on cryptography and cybersecurity. I've met some of the smartest people that I know back then, many of whom are still amongst my best friends. I've also spent a few years working on gaming and computer graphics, which remain among my favorite fields to this day. I studied pure math (for the soul) in Tel-Aviv University. Oh, and I have a blog which features my collection of math and hacking puzzles, as well as write-ups of some of my favorite weekend projects. Try a fun puzzle like Rabbit Season, Really Equal? Naturally!, Expanding Map, or (×2, +1) Equivalence.