The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research
Exploring TinyStories, a small natural language dataset for modest compute budgets, and its impact on language model performance and interpretability.
Watch Episode Here
Video Description
Nathan Labenz sits down with Ronen Eldan and Yuanzhi Li of Microsoft Research to discuss the small natural language dataset they created called TinyStories. Tiny Stories is designed to reflect the full richness of natural language while still being small to support research with modest compute budgets. Using this dataset, they began to explore aspects of language model performance, behavior, and mechanism by training a series of models that range in size from just 1 million to a maximum of 33 million parameters – which is still just 2% the scale of GPT-2. In this conversation, Nathan, Ronen, and Yuanzhi touch on LM reasoning, emergence, interpretability, and what understanding can be extended to LLMs.
LINKS:
Tiny Stories paper: https://huggingface.co/papers/2305.07759
TIMESTAMPS:
(00:00) Episode Preview
(07:12) The inspiration for the Tiny Stories project
(15:07) Sponsor: Omneky
(15:44) Creating the Tiny Stories dataset
(21:27) GPT-4 vs GPT-3.5
(24:13) Did the TinyStories team try any other versions of GPT-4
(29:23) Curriculum models and weirder curriculums
(35:34) What does reasoning mean?
(46:27) What does emergence mean?
(01:01:44) The curriculum development space
(01:11:40) The similarities between models and human development
(01:20:12) Fewer layers vs. more layers
(01:29:22) Attention heads
(01:33:40) Semantic attention head
(01:36:54) Neuron technique used in developing the TinyStories model
(01:52:20) Interpretability work that inspires Ronen and Yuanzhi
TWITTER:
@CogRev_Podcast
@EldanRonen (Ronen)
@labenz (Nathan)
@eriktorenberg (Erik)
Thank you Omneky for sponsoring The Cognitive Revolution (https://www.omneky.com/). Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.
Music Credit: MusicLM
More show notes and reading material released in our Substack: https://cognitiverevolution.substack.com