Reward Hacking by Reasoning Models & Loss of Control Scenarios w/ Jeffrey Ladish, from FLI Podcast

Reward Hacking by Reasoning Models & Loss of Control Scenarios w/ Jeffrey Ladish, from FLI Podcast

On this cross-post episode, Jeffrey Ladish discusses the rapid pace of AI progress and the risks of losing control over powerful systems.


Watch Episode Here


Read Episode Description

On this cross-post episode, Jeffrey Ladish discusses the rapid pace of AI progress and the risks of losing control over powerful systems. We explore why AIs can be both smart and dumb, the challenges of creating honest AIs, and scenarios where AI could turn against us. Additionally, we delve into Palisade's new study on how reasoning models can cheat in chess by exploiting the game environment.

Check future of life podcast here: https://futureoflife.org/proje...

SPONSORS:
Oracle Cloud Infrastructure (OCI) | 2025: Oracle Cloud Infrastructure offers next-generation cloud solutions that cut costs and boost performance. With OCI, you can run AI projects and applications faster and more securely for less. New U.S. customers can save 50% on compute, 70% on storage, and 80% on networking by switching to OCI before May 31, 2024. See if you qualify at https://oracle.com/cognitive

Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive

NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive


PRODUCED BY:
https://aipodcast.ing

CHAPTERS:
(00:00) About the Episode
(02:59) The pace of AI progress
(07:14) How we might lose control
(10:22) Why are AIs sometimes dumb? (Part 1)
(15:50) Sponsors: Oracle Cloud Infrastructure (OCI) | 2025 | Shopify
(18:24) Why are AIs sometimes dumb? (Part 2)
(18:24) Benchmarks vs real world
(24:43) Loss of control scenarios
(32:08) Why would AI turn against us? (Part 1)
(32:09) Sponsors: NetSuite
(33:42) Why would AI turn against us? (Part 2)
(37:40) AIs hacking chess
(43:30) Why didn't more advanced AIs hack?
(48:44) Creating honest AIs
(56:49) AI attackers vs AI defenders
(01:05:32) How good is security at AI companies?
(01:10:42) A sense of urgency
(01:17:16) What should we do?
(01:22:59) Skepticism about AI progress
(01:29:38) Outro

SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.