The AI Multimodal Revolution with Junnan Li and Dongxu Li of BLIP & BLIP2

The AI Multimodal Revolution with Junnan Li and Dongxu Li of BLIP & BLIP2

Watch Episode Here


Read Episode Description

As recently as January 2021, the challenge of "interpreting what is going on in a photograph" was considered "nowhere near solved." Today's guests Junnan Li and Dongxu Li changed that with their publication and open-sourcing of BLIP, which delivered state-of-the-art performance on image captioning and other vision-language tasks.

BLIP became the #18 most-cited AI paper of 2022, and now Junnan and Dongxu are back with BLIP-2, this time showing how small models can harness the power of existing foundation models to do multi-modal tasks.
We talked to Junnan and Dongxu about their research and how they see the trend toward connector models shaping the future.

We talked to Junnan and Dongxu about their research and how they see the trend toward connector models shaping the future.

(00:00) Preview
(01:17) Sponsor
(01:35) Intro
(05:50) Convergence of AI techniques
(07:33) Evolution of BLIP to BLIP-2
(08:12) How BLIP-2 unlocked multimodal functionality
(12:43) The size, training dynamics, and optimization function of BLIP
(20:15) Practical/Business applications of BLIP
(29:43) Efficiency of BLIP-2 compared to other models
(41:52) Two-stage pre-training
(47:11) Architecture of Blip-2’s connector model
(58:52) Language models as the executive function of the brain
(01:07:32) Vision for an ultimate multimodal system and democratized pre-training for models
(01:12:59) Useful AI tools in these researchers’ day-to-day
(01:14:56) Upcoming projects

*Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.

Twitter:
@CogRev_Podcast
@LiJunnan0409 (Junnan Li)
@DongxuLi_(Dongxu Li)
@labenz (Nathan)
@eriktorenberg (Erik)

Join 1000's of subscribers of our Substack: https://cognitiverevolution.substack.com/

Websites:
Cognitivervolution.ai



Show Notes:
- Original BLIP demo
huggingface.co/spaces/Salesforce/BLIP

- BLIP 2 demo
huggingface.co/spaces/Salesforce/BLIP2
https://twitter.com/LiJunnan0409/status/1621649677543440384

- BLIP is the #18 most highly-cited paper in AI
https://mobile.twitter.com/LiJunnan0409/status/1631854807505076224

- Image captioning comparison tool
https://huggingface.co/spaces/nielsr/comparing-captioning-models

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.