In this episode of The Cognitive Revolution, Nathan welcomes back Zvi Mowshowitz for an in-depth discussion on the latest developments in AI over the past six months.
Watch Episode Here
Read Episode Description
In this episode of The Cognitive Revolution, Nathan welcomes back Zvi Mowshowitz for an in-depth discussion on the latest developments in AI over the past six months. They explore Ilya's new superintelligence-focused startup, analyze OpenAI's O1 model, and debate the impact of Claude's computer use capabilities. The conversation covers emerging partnerships in big tech, regulatory changes, and the recent OpenAI profit-sharing drama. Zvi offers unique insights on AI safety, politics, and strategic analysis that you won't find elsewhere. Join us for this thought-provoking episode that challenges our understanding of the rapidly evolving AI landscape.
Check out "Don't Worry About the Vase" Blog: https://thezvi.substack.com
Be notified early when Turpentine's drops new publication: https://www.turpentine.co/excl...
SPONSORS:
Shopify: Shopify is the world's leading e-commerce platform, offering a market-leading checkout system and exclusive AI apps like Quikly. Nobody does selling better than Shopify. Get a $1 per month trial at https://shopify.com/cognitive
Notion: Notion offers powerful workflow and automation templates, perfect for streamlining processes and laying the groundwork for AI-driven automation. With Notion AI, you can search across thousands of documents from various platforms, generating highly relevant analysis and content tailored just for you - try it for free at https://notion.com/cognitivere...
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers13. OCI powers industry leaders with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to
OCI before December 31, 2024 at https://oracle.com/cognitive
SelectQuote: Finding the right life insurance shouldn't be another task you put off. SelectQuote compares top-rated policies to get you the best coverage at the right price. Even in our AI-driven world, protecting your family's future remains essential. Get your personalized quote at https://selectquote.com/cognit...
RECOMMENDED PODCAST:
Unpack Pricing - Dive into the dark arts of SaaS pricing with Metronome CEO Scott Woody and tech leaders. Learn how strategic pricing drives explosive revenue growth in today's biggest companies like Snowflake, Cockroach Labs, Dropbox and more.
Apple: https://podcasts.apple.com/us/...
Spotify: https://open.spotify.com/show/...
CHAPTERS:
(00:00:00) Teaser
(00:01:03) About the Episode
(00:02:57) Catching Up
(00:04:00) Ilya's New Company
(00:06:10) GPT-4 and Scaling
(00:11:49) User Report: GPT-4 (Part 1)
(00:18:11) Sponsors: Shopify | Notion
(00:21:06) User Report: GPT-4 (Part 2)
(00:24:25) Magic: The Gathering (Part 1)
(00:32:34) Sponsors: Oracle Cloud Infrastructure (OCI) | SelectQuote
(00:34:58) Magic: The Gathering (Part 2)
(00:35:59) Humanity's Last Exam
(00:41:29) Computer Use
(00:47:42) Industry Landscape
(00:55:42) Why is Gemini Third?
(01:04:32) Voice Mode
(01:09:41) Alliances and Coupling
(01:16:31) Regulation
(01:24:58) Machines of Loving Grace
(01:33:23) Taiwan and Chips
(01:41:13) SB 1047 Veto
(02:00:07) Arc AGI Prize
(02:02:23) Deepfakes and UBI
(02:09:06) Trump and AI
(02:26:31) AI Manhattan Project
(02:32:05) Virtue Ethics
(02:38:40) Closing Thoughts
(02:40:37) Outro
SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://www.linkedin.com/in/na...
Youtube: https://www.youtube.com/@Cogni...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...
TRANSCRIPT:
Nathan: Hello, and welcome back to The Cognitive Revolution!
Today, Zvi Mowshowitz returns for another wide-ranging conversation, as we look back on the major developments in AI, including the technical, strategic, and political – that have transpired in the six months since his last appearance.
We discuss Ilya's new startup, which aims to go straight to superintelligence, the utility (or lack thereof) of OpenAI's o1 for things like coding and Magic the Gathering, the likely impact of Claude's new computer use capabilities, and the recent discourse around the "end of scaling".
We also analyze the complex web of partnerships emerging between Big Tech companies, the regulatory outlook in the wake of the veto of SB1047, and the latest OpenAI drama, this time centering around the question of who should and will get what as they convert to a for-profit company.
Toward the end, we consider the potential impact of Donald Trump's return to the presidency on AI trajectories, and I pick Zvi's brain on how the AI safety community understanding of individual virtue should evolve in this new political reality.
On all these topics and more, Zvi delivers a mix of strategic analysis & insight that you can't get anywhere else – except of course for his blog, Don't Worry About the Vase, online at thezvi.substack.com – and I personally find these exchanges super helpful, both for maintaining general situational awareness and in some cases for challenging my own mental models about how to most effectively work toward beneficial AI outcomes.
As always, if you're finding value in the show, we'd appreciate a review on Apple Podcasts or Spotify, or just share it with a friend who's trying to make sense of the rapidly evolving AI landscape. And we welcome your feedback, either via our website, CognitiveRevolution.ai, or by DM'ing me on your favorite social network.
Now, I hope you enjoy this latest discussion with the great Zvi Mowshowitz.
Nathan: Welcome back to the Cognitive Revolution.
Zvi: Thanks. I always enjoy it here.
Nathan: So it's been a minute and the pace hasn't slowed down. I went back and looked at the timestamp of our last conversation and it was right after OpenAI and Google had sort of previewed their respective voice modes. And then I think literally as we were recording, there were like, several resignations from OpenAI safety teams. Then we did an immediate part two to talk about that. And certainly quite a bit has happened in the intervening months. So what I thought we would do is just run it down, get your take on a bunch of stuff. We're both in the business of keeping up with this on a weekly, daily and hourly basis, but not everybody can do that. So this is kind of the zoomed out few months recap and analysis. How's that sound? I like it.
Zvi: Yeah, let's do it.
Nathan: Cool. Looking through your blog and just revisiting the news myself, I thought that one story that kind of came and went, but which has at least some chance of proving to be maybe the most important story of all over the last few months was Ilya's announcement that he is starting a new company, which is going direct to super intelligence safely, of course. And so we've nothing to worry about there, but just... Basically, just want to let you know, we're going to go build super intelligence and we'll let you know when we've got it. No products and unclear how many updates we'll get in the meantime. It'll definitely be safe. We know that, right? It's right there in the title. So it's, I would say their ability to do this is obviously still questionable, but to borrow a infamous phrase, strikingly plausible. And, you know, that in and of itself is pretty crazy. Certainly you couldn't have made such a plan at all credible up until pretty recently. What do you make of it? And how often do you wake up at night wishing or dreaming that you might've got an unexpected update on their progress?
Zvi: We have a little bit of an update just this week, actually. Aelia said that this is no longer the age of scaling. The 2010s were the age of scaling, and now we're back in the age of wonder and discovery. Once again, everyone is looking for the next thing. Scaling the right thing matters more now than ever. So basically, the OpenAI attitude of we'll just add a zero to everything and see what happens is not even what OpenAI is doing anymore, right? They move on to a one because their previous strategy wasn't working. And obviously he's not going to spill the secrets, so we don't know how well it's going, but presumably they're not finding it to be that easy. But yeah, it's deliberately opaque. It's deliberately hard for us to tell what's going on inside. I do know that I am skeptical that they can achieve safe superintelligence if they achieve superintelligence. I'm also reasonably skeptical they'll get to superintelligence by this direct method, but definitely not impossible. I can't rule it out.
Nathan: So my next topic was 01 and you're touching on that. And then also this sort of meta debate as to what's working, what's not working. I find this quite confusing because on the one hand, we have Altman and Dario, and I would say also the DeepMind founders, although not quite as recently, all pretty clearly on record saying, we know what we need to do. We don't really see any fundamental barriers. We think we're going to get there in a two to three year timeframe. And then we also have this sort of end of scaling narrative that has popped up. It makes some intuitive sense to me when I'm like, okay, 15 trillion tokens or whatever is already a lot. And yeah, there's a lot more on the internet, but a lot of it is crap. Data quality matters more than quantity has certainly been true in my experience, albeit at much smaller scale than what these guys are doing. We've heard these reports also that 01 is... In addition to, of course, introducing a new inference time scaling law is being used to create training data for the next generation of model, the reported Orion model. They're still raising more and more billions and they're still planning to build bigger and bigger data centers. And the number of H-100s that people are training on is going up. Elon's got his Tennessee Valley Authority contract with like record time to deploy and network 100,000 H-100s. Meta has announced that they are training Llama 4 or will be. I'm not sure if it's R or soon will be training Llama 4 on 100,000 plus H-100s. So how do you sort through that? In your mind, is the era of scaling over or are they throwing, is that like throwing people off the scent or what?
Zvi: Right. Is deep learning hitting a wall in air quotes? So yeah, we have these two camps. We have the people who are like, well, look at all the lack of amazing products that are coming out. It's all high. You know, you've hit a wall. It doesn't get much better than this. It doesn't get very, very slowly. And then everyone in the major AI companies is like, yep, AGI soon. End of world as we know it in 27 or thereabouts. And then who knows what happens. And then they had this strange forecast of we will get AGI that should be transformational, that can radically accelerate massive AI progress and do all these other things. And then people like Altman are like, well, then nothing much will change. Your life will continue the same way. And it will basically be better economic growth. So like you'll pay lower taxes or inflation will go down. But that story makes no sense, right? Like you might not get AGI, in which case we get the story of UFS changed much, but things get a little bit better because AI is cool. But if we get AGI, obviously life is not going to seem the same for very long. It might get way better. It might get worse. It might end. We don't know. But the story on it stays the same. Certainly wouldn't be like a automatic default outcome. It would be a, we deliberately engineered that it stayed the same. We chose under the hood to keep it the same because we couldn't do better. And there's some arguments for that being a reasonable choice if you don't confidently know how to do better, but it's not a natural outcome at all. And their rhetoric brings hollow, right? It makes you wonder, do they really believe that what they're describing is the AGI that's coming is AGI? Or are they trying to hype you up, slash get on the Microsoft contract, slash just make themselves feel if you're doing the thing? Even if it's about to be weak, you know, the miraculous definition is like not that strong, that kind of thing. So look a little bit farther ahead in all of this again. So my reconciliation is something like, The strategy of just pushing forward by adding more zeros to everything. The same strategy that went from GPT-1 to 2 to 3 to 4. There's some technique involved, but Jack Clark's impact right up. It's like expertise is everywhere. It's just about computing data. That's coming to a place. That's just marginal. Marginal returns are declining, even in scaling law terms. And you're not going to get very far from here doing that. And that's why O1. That's why these other approaches, but this is my way of understanding it. By using what the models have to offer to refine the training process, by using inference time compute in various ways, by using things like the O series of models, you can get these labs. People believe you can get that. You can overcome this barrier. We can pivot to a different scaling law, to a different method, and we can get where we need to go to the AGI. That's for at least some definition of AGI reasonably soon. Slash, they just have this statement, straight lines on graphs don't stop, right? It's like Moore's law. Something will happen. We'll figure out something else. Even though what we were doing before stops working, don't worry about it. We'll figure it out. And there's a lot of talk about things being stronger than we know, right? Rune talking about how everyone's under-upgrading on O1, but we don't see the O1 he sees. We see O1 Preview, right? And I gotta say, O1 Preview, super unimpressive to me. Having sat with it for a while, I basically don't use it. I find it to be, every time I ask O1 Preview, for a question, it like thinks for a while, talks about all of these relevant details, gives me lots and lots of data points. And like, none of it addresses the actual thing I was trying to do. It doesn't follow the conceptual logic that I was trying to get it to go to the thing that's useful to me. Like when I use use on it, I should use use on it. It almost happens.
Nathan: That's quite interesting. I would say I have a different user report from a one. I certainly the downsides are real. The wait time, my goldfish attention span is such now that even a 45 second delay to first token is enough for me to like tab over to something else and forget what I was doing and then come back like eight minutes later. And of course the answer has been sitting there for seven minutes at that point. So from a user experience standpoint, it is... And I actually do this outside of ChatGPT. So that may be a difference as well, because I've found that what I want to use 0.1 for, or where I have at least found utility for it so far is in mostly coding. I've been doing a fair amount of coding lately, partly because I have a little app idea I want to build, partly because I wanted to catch up with Cursors, obviously had a moment and I hadn't been coding for a little bit. So I was just trying to get back into it. I found that RAG in general is a real pain in my butt. I don't really like any of the RAG solutions that I've ever used, almost like without exception. That does extend to cloud projects and to GPTs. I feel like I can stuff a lot of stuff in there, but they're always a little too economical in terms of what they bring into context. And it's often opaque too. So I'm like not sure what they're actually looking at from whatever I gave them access to, but that's another source of confusion. Claude on the main consumer-facing product does allow you to just paste in long documents. So I like that. But ChatGPT often doesn't allow me to just paste in the long document. It'll just straight up say that's too long of an input. Sorry. So then I'll end up normally not using the experience is better. I would say with a one in ChatGPT, because it gives you these like reasoning summaries to entertain you while you're going. If you go over to the, the platform console type version, you can there paste in however much content you have obviously up to that context window limit. Yeah. But it doesn't give you this sort of real time summary of reasoning. You just have to wait. Anyway, with all that said, I still do find it useful to take the full code base of the little app that I'm developing and paste the whole thing in there. And I have been doing head to heads with the latest on it and which is amazing, of course, too, and the 01 and asking them the same question. I basically will typically say, here is the full code base of an app I'm developing. It's to help people create gold standard examples for use in AI automation. So I just give it like a one sentence summary of this is the purpose of this app. And then I'll ask for whatever I want help with. And if it is a local code change, sod it is, I would say, hard to distinguish in many cases, maybe even better, a little bit more precise with the outputs. It has the function calling and stuff like that. So it does integrate into Cursor better. I don't use a one through the Cursor interface so much because it doesn't like reliably respect their format. And in turn, then the whole functionality will break. But there is a certain way in which it does feel more like a high level collaborator to me. If I say, what would be the highest priorities for refactoring this application? Or I want to implement a new feature and I'm not sure how best to do it. I do see something qualitatively different there where Claude will give you an answer and it might even be the best answer often enough. But 01 does give me something extra where it'll sometimes say, here's three different ways you could do it. This way is the most direct but hacky way that you could do, but isn't really recommended. Here's another way that's like going to be a big refactor and maybe involve a couple new libraries. And this might be like the enterprise way, but it doesn't look like you're really coding that way. And here's kind of the Goldilocks way that is hopefully the best of both worlds and what I would recommend to you where it'll be like lighter lift, but not a terrible solution. And I do feel like it guides me toward better strategies in a way that I have not seen other things do.
Zvi: So I had finally gotten... into actually using the editing code. I found Cursor to be a godsend versus not using Cursor. I can't imagine a one being good enough that if it doesn't integrate into Cursor, that I would want to not use Cursor's integration. Like it just has to be so far ahead. Because we jump from quad sonnet without Cursor to quad sonnet, the same exact model with Cursor with a 10X. So it's like, well, I'm giving up that 10X. That seems so insane. And the thing you do, and all I've gotten is Cursor to do the thing you're describing. If you actually just ask it, what are the different ways you could do ACT? It will tell you. You just have to use semi-reasonable prompting. And just inside, Cursor will spit out possibilities. I'll discuss it with it. We'll select the option. I also discovered that he really wanted to focus on using one model. I wanted to do this because when there was an outage and switched me over to some OpenAI model instead of Claude for a few hours, that always couldn't work. Because every time I tried to ask you anything, it would just constantly have all these different intuitions to try to change all the code. So even if the model hadn't been as good as Claude, which I didn't think it was from what I could tell, Well, I don't want to have to just debug all of the new code that's replacing all of the existing debug code. Like, that seems awful. No, I'll just switch it back when I get back. Which is just, like, probably, I have to assume. So all that sucks. So I'm just going to wait this out is when I decide, like, other things I can do for a few hours. And I figure it'll come back. But, yeah, I haven't used 01 for coding. a substantial amount exactly this reason that like it's not affordable slash integrated enough to use it as your primary in Cursor and nobody ducks. But it's definitely possible that it has like specific uses where it makes a lot of sense. This is a specific place where thinking for a long time at this level is really, really helpful for accepting what you want to do. And it comes down to what do I think of a one preview in general? And it's yeah, it's like This thing where you have the not-so-right student, but he's spending a lot of time on every assignment, and she's trying really hard. And she's really doing her best to turn in as much work as possible and try to get the right. But it's not that smart. It's not any smarter under the hood than the 4-0, basically. So it's inherently limited by it. So it's just sort of my use cases. It turns out that Vex is not useful to me. And in fact, when I use chat to PC, which I still do, I'm normally using it to the web search, which is just incompatible with O1 yet, which is one or the other.
Nathan: It's definitely yet to be unhoveled O1 preview in quite a few different ways.
Zvi: Right. And it would say that O1 itself is much better than O1 preview, right? And you've seen various claims to that. And then O2 will be obviously much better than O1. And they claim the scaling was applied, so it'll get rapidly better. And it's possible that it will, but I have to express skepticism that the first product, the MVP, just didn't seem like something that I almost ever wanted to use. And also, my model of what it's doing has an inherent restriction on it. You're thinking really hard, but you're not that smart. Okay, what if we thought even harder? What if we thought ten times as much, a hundred times as much, a thousand times as much as that, but we're still not that smart. And my experience as humans in this spot, and my experience of what I got from a woman is, It wouldn't help, right? You're going to have to find a way down to this box. And I'm not sure you can. But obviously, I've been surprised before. I could be very wrong. We didn't have a very horrible version right now. Like, we can't feed it context in the proper ways. We can't have it search the web. We can't have it do a variety of things. Who knows what happens here from now?
Nathan: The new quad through five sonnet to give it its due is also definitely amazing. And it's a good point that it may be... My experience may be less about... You know, what constitutes a fair test or a fair comparison, a fair head to head is like not always obvious. Giving them the same question when they have different default behavior may not be the right way to go about it. Maybe to do a fairer comparison, I should prompt them each differently or maybe the same, but like in a way that sort of tries to get them to do a similar behavior.
Zvi: I would say you should prompt them the way you would prompt each one of them. If it happens to be the same, which it actually is, then the same prompt is fine. I would also add that The time delay just doesn't bother me very much. That's not the issue here. In fact, in Cursor, not only do I regularly run on it, I run on a fast command, though I'm not convinced I necessarily want to keep paying for boardworms made from the latest Norway very small anyway, but also that it takes so long to write out the queries, follow the code, even with fast commands from Sonnet. Like, it's not that minimal for all one. I have gotten used to the idea of financing for a substantial amount of code with my question. The moment I see it's actually answering my question, I'm tabbing it. I'm not only tabbing it, I'm screening out. I have a separate set of desktop screens for coding. And I have a desktop screen for writing. And I always just tab back to writing. Like entirely screen shit. And just complete mode shit. And then come back. Because like, I hear like a lot of programmers, like most programmers, they want to be in state. They want to focus and have all this stuff in their head. And they don't want to be distracted. And I turn out that I'm the opposite, whereas if I have to, like, just constantly fight with these little details for too long in a row, I will want to scream. And it's much better for me to do it in my size chunks. I don't understand why my brain works that way, but I'm able to context shift that way. Like, I don't lose state. Great. I could just do this.
Nathan: Yeah, I think I'm probably... More like you than maybe the, certainly it's hard to imagine. If the models go out, I do stop coding. That's for sure. Because it's just so painful to sit and type. You can do this. You can do this tomorrow, right? There's no ride. Time is fungible. You have other things you can do. Have you tried anything else making Magic: The Gathering decks or anything like that with a one that would be a test of its like ability to grind through? Cause I feel like they're showing some pretty, you know, the claim that there's a new scaling law and that this inference time compute gives you like better and better results. I'll be with a logarithmic curve, but nevertheless, like, in a not obviously bounded way, it would seem like some of these tasks that are like logical, creative, like combinatoric would be really well suited for that. So I wonder if anything along the lines of a Magic deck creation.
Zvi: I find that example so bizarre just because, I mean, first of all, it's something that like the LMS is one that can tell those kids do, right? In any interesting way at all. And second of all, that It is one of these tasks that's very, like, core skill gated. And then if you take a bad player and you give them a month of actively even playing games, changing cards, thinking about stuff, but you don't give them the information of what's in other people's decks. You don't let them just copy, right? They won't get anywhere. They will max out in a very, very low level. It'll be unusably bad. Whereas if you give it to an expert tech builder, they can build something almost immediately with no feedback, with no iterations, that will be pretty good. And a lot of my best creations and the best creations of other people that I know of emerge like on the first day or the first time they have an idea very close to the final result. It's all kind of very heuristic-based theory-crafted. And it's not inference. It's not inference on computer tests, right? Like inherently for humans. And I would very much not expect a one be capable of anything of that type. You can't logically bridge out these things. They get too complex too fast. There's too much randomness. There's too many possibilities. There's too much unknown information. You're very much doing the pre-training style thing, not the inference training style thing, right? So to the extent that they can play, it would be opus style things that would be good at it, I would think. But also, we see that they basically can't play. Someone used a Quad's computer usage. Actually, look it up to Magic Arena to see if they can play some Magic. And the answer is, like, it can try and play, but very not well. And deck rolling is so much harder than playing. Most of the time in Magic playing, it's reasonable. Just do reasonable things. And then, obviously, over time, if you're not strategic and you're not sophisticated, you will... run into problems with people who notice the ways in which you're not being strategic and you're not being reasonable. But yeah, they couldn't do standard things that well. I'm sure you could vastly improve it if you cared enough on that front. But the task that I'm always looking for is go into a new set with nothing but a spoiler list that's not in your training data. And then let's see you draft. Let's see you decide what's good, assemble a strategy, and play it. Spoiler list is like, these are the cards that are in this set. Yeah. Here's the 300 cards. Here's the 200 cards or 300 cards that we're going to be dropping from. Here's how rare each one is. Here's what each one does. Right. And a human can do that and then have a reasonable offense of what things are good and bad and what to go after. Slash, I do this thing where I did this actually last week. I played my first draft yesterday where I would just, I don't know what's in the set. I just look at the cards and bring them up as I go. And I'm learning as I go based on what's there. What is the car trying to tell me? What are the stride hands supposed to be using? What could be good? What's inherently just naturally powerfully? Just work your way through it.
Nathan: Absolutely. Can't be honest. There might be like an interesting cognitive style difference here. I feel like I played a little magic in my youth. It's been a long time and I was never great at it or particularly serious about it. But I feel like it is more for me like coding. Like there is some sort of like in the app development process. There's some sort of intuition moment or some sort of often like experiential moment where I'm like, oh, I want to do this right now in this app and the function doesn't exist. Okay, I need to create it. And then that's like to quote Edison, that's typically less than 1% of the effort is that moment of inspiration. And then comes obviously a lot more perspiration of actually if I'm doing it by hand, which I don't do anymore, figuring out what files and how to change them and making an implementation plan and typing it all out. And that definitely is like a grind and I don't have any shortcuts from that moment of inspiration to a solution that don't involve like just gradually considering all the different things that this touches and all the files that it exists in and so on. For a magic deck, I feel like I also recall having some moments of, oh, this and this could be cool, but then really building out the deck, aren't you kind of working your way through like, what if I added this one to it? Yeah, that could be good, but not so good. What if I did this one to it? You're not going through that sort of, there's no brute force for it. You're just like, your decks like spring forth from the head of Zvi fully formed.
Zvi: You have a very good sense over time. what the principal decks have to look like. And you've already done this kind of research where you look at what's possible, what's in the space, what types of possibilities there are. And by building these other decks and pursuing other strategies and knowing what's out there, a sense of like, what does the deck have to do? What do devs typically need to do? So you say, okay, I'm going to play this combination of three cards because it seems like a powerful thing. I know what colors that has to be. I know about how many lands this is going to have. I know about where the meta distribution is going to have to be. I know about how many creatures that can afford to play. All these numbers fall into place. You start doing reasonable things. You try several configurations. There's always so many good cards that sort of do vaguely the thing we want. I'm talking to you like, I want something that I want a five drop that like It flies and then it gets in this way or, you know, an enchantment and this thing or whatever, and you go searching, maybe you go searching for that. But it's a pretty fast search because you either remember or you know what type of sprite fall and you have an idea. Yes. Occasionally you will do the literal thing of, especially when you're starting out to a new set, new format, where you just look through like all the cards and you just pan through them. You're just like, okay, any of these seem like they might work in this strategy. I just try them all out. You're auto-rejecting like, oh, let's halt them as just, this isn't very powerful. I didn't know what I was doing. And then like, what we're describing is something I would occasionally do, especially in formats that requires you to dig deep for where you're looking for a solution to pick your problem. So we used to call it a crack search, where you just look for the entire spoiler list and just go card by card. Does this solve my problem? Does this solve my problem? Does this solve my problem? And all of the answers is no, obviously not. What am I even talking about? And like with Approach Ward 1, okay, I have this problem of these picture lists that's full of removal. What am I going to do about it? I'm playing white blue. I know what my solution is. And now I'm going to split this four of us, card by card, every single card. And I see pure reflection, which is like a terrible card. But it happens to solve this exact problem. There's nobody who's ever put in it. And I'm like, oh, I got it. I got the solution. I don't know where I'm going to win, but I think I might have it. Because in this particular scenario situation, this will solve this specific problem. And we try it. And it works. And my sideboard wins me the quarterfinal match. how it's going to be in the final. But that's not normally how it goes. Like, you are, like, you're searching for patterns. You're, like, trying stuff out. Often this involves, like, a short period of just playing a handful of games. And, like, one thing you'll notice is a very good player who has a lot of experience, they can play, like, two games. And I say, like, I know what this batch is about. I know what was going on with this deck. She's going to work. I need to retool. Because they can tell how they feel, how they're looking at different lines, how they're looking at different possibilities, they can think about counterfactuals, and they effectively get so much more out of that input. And they can just think 10, 100 times as fast. Whereas a bad player meets 10, 20, 30, game, they have a hammer to their head. What's going on here? Practical about the pattern.
Nathan: Are you aware of any projects to try to get an AI system to play Magic at a high level? Obviously, we've seen poker beaten, effectively beaten, and Diplomacy.
Zvi: Yeah. Not any serious efforts. Magic is a vastly harder game for the AI. I think Magic is very close to AI complete. If you show me your system can do the test where we are going to draft a format of new cards you have never seen before. They're not just slightly different numbers from cards you have seen before. Maybe even that. And you're going to be able to play on a level of strong humans in the resulting games with you that you created. Yeah, like you could try to hack it together. But if you can do that without just hard coding a lot of weird stuff and doing something really hacky, like you do it, yeah, the sky's the limit for whatever you just did. Like it's just so general what you're doing. I can invent new mechanics and just work in different ways and it just figures out what they need in real time. Interesting.
Nathan: Yeah, interesting. So I guess maybe final point on 01, my sort of headline summary for basically since GPT-4 has been in terms of like where AI is today, what it can do, what it can't do, that the best AIs are closing in on human expert performance on routine tasks. Obviously all those words, fairly carefully chosen. Carrying weight, especially routine and task. 01 is the first one that shows a bar that is higher than the human expert level on MMLU, for example, where they put that pretty much at like 90% and everything before 01 was getting close, but in the 88, 89, one or two points behind the human expert. And then 01, they get like 92%. And so on the one hand is like a couple of points better on the other hand, maybe it's crossed some threshold. I mean, those tests are fairly noisy. Do you think that has much meaning or would you infer anything from that when it comes to whether there's like a top out at human expert level?
Zvi: It sounds a lot like the thing where like Nate Silver's forecast can't be between 49% and 51% and everybody suddenly, whoa. There's this level, it's like considered human level, but it's arbitrary, right? Because which human, right? A natural human who has in practice is going to score a lot less than 90%. Yeah, that's done at two, I think. The best human is going to score more than 90%, I assume. She's not like human, that possible human. So there's not like 90% is an arbitrarily chosen, let's see how we're basically doing, but there's no reason there'd be a wallet. particularly, not a hard wall. And if everything is asymptotic for a while, then something jumps to the asymptote. Then it's interesting whether or not that was a human-level asymptote. But as far as the MOU, yeah, are they missing the same questions over and over? Is the MOU, 90% of the questions are doable? And every kind of the questions are either have the wrong answer or the code base in the answer key or are ambiguous or are just incredibly harder. And they just can't be done. And therefore, you hit this wall on NLU, which may or may not represent a real wall. I don't know. I haven't looked at these things. But different evals have different quirks to them. But in general, once you hit 90% on an eval, I have to assume it's not already an eval for you anymore. You should move on to the next eval.
Nathan: Humanity's Last Exam coming soon. Contribute your extremely difficult questions to Dan Hendrickson squad. I mean, I like the name, Humanity's Last Exam.
Zvi: They understand that if he had passed this exam, then there's no longer worth time for us to be answering your question.
Nathan: Maybe you could formulate a magic question for Humanity's Last Exam. You'd be the perfect person to do it.
Zvi: I mean, we could, but, you know, we got this handled.
Nathan: I do find it sobering, honestly, how, you know, I've flattered myself as a reasonably smart person. I'm like, what can, what questions could I contribute to humanity's last exam? Do I know anything that's actually hard enough to merit inclusion in that test set? I don't know. I think you would actually have a good one with magic if you wanted to do it. I'm looking around my life and I'm like, not sure I'm doing anything that's actually hard enough to get into humanity's last exam.
Zvi: I think a lot of what I do is the kind of thing that doesn't go on this exam in the sense that it's not that I am doing this very hard specific task. That's just not like where the difficulty in my job comes from.
Nathan: It's just the volume of information and the connection drawing across. Connection drawing. Wide spaces.
Zvi: Knowing which questions to ask and emphasize, be able to iterate off of like how to explain things and how to draw things together. Just a lot of like a way of building a map and understanding things and being able to hold a lot of this together and think carefully about all these different questions and how they relate to each other. But it's not like there's one specific card question where you have to go really deep. I think I did more of that earlier in my career, especially when I was trading or playing Magic, when you would really focus very narrowly on getting the exact right answer to a very valuable question where you were having effectively a competition to find the best possible answer, right? You were up against someone else where you were trying to be the best. And now I'm doing something that no one else is doing. To our extent, I'm just very unique. And there, it's like very much not about getting that last 4% of efficiency, right? It's about the big conceptual questions. And it's about being able to efficiently like iterate and change things towards where you want to go. And a lot of the things I make are very much not designed to be precise or actually efficient and designed to like take this in places that are interesting and explore things that I enjoy that are compatible with providing what I need to provide. And like, slowly working my way towards a thing, but also being deliberately non-optimized because something is being deliberately optimized towards a goal, it's inherently untrustworthy. Not just that it's perceived as untrustworthy, but it actually is untrustworthy. Because if you're maximizing for any one goal, then you stop actually carrying out other things and they just fall away. And everything else people rely upon goes away. And that's an AI alignment problem too, but it's also a very practical problem, right? If you tell a potentially ruthless person to make sure the shipment gets in by Monday afternoon, I hope you didn't care about how they did it. Right. They're going to find a way. And if you ask exactly how they did it, you might not like it, but you said, just get it here. So they're going to do it.
Nathan: Okay. So let's pause for a moment. You mentioned cloud computer use on a magic application. That also is a big development. Your first time a foundation model developer has come out with something that is basically able to take like open-ended action in the in the broad digital world at least it's kind of like oh one is like very hobbled i try not to lie to my ai assistants more than i have to i occasionally have to say that i already have a doctor appointment scheduled and this is just preparation for that or similar things like that when it doesn't want to take on liability in this case similarly it was like you know, I can't create an account. I was like, okay, I logged into my account. You can use my account. I can't use your account if you're logged in. Finally, I was like, okay, I logged out. Now you're using the free no account version. And if it had checked, it would have found that I had lied to it about that. And it in fact was still logged into my account, but it didn't check. It was gullible enough to believe in my little white lie. And In my case, it was able to use Waymark, our, you know, video products and pretty effectively, not super fast, you know, wouldn't have been able to play real time video games. It was probably better because our product is an AI product. So people are prompting the product for what kind of video they want, typically for their small business. It was actually better at prompting, um, clearly than like our typical user. a little less facile with various parts of the UI. But all those things seem like they'll be destined to be resolved. How big of a deal do you think this will be for data purposes? Is this sort of go out and do stuff in the world a good way to get over the data wall? And what do you think of their decision to put this out and their rationale, which is very OpenAI-like of, we'll put it out while it's weak so we can start to adjust. Better that than dropping it on you when it's strong. I feel like we've heard that before. And who's imitating who at this point is not entirely clear.
Zvi: Yeah, I had to think for a while about what I thought about to release that feature. Ultimately, I think my philosophy is something like, iterative deployment is better than non-iterative, but it's still deployment. Sometimes deployment is bad. But if you're already determined that you're going to deploy version 2.0, then deploy version 1.0 Before that will probably be good. The exception is if you don't want to raise the alarm slash interval for everybody and tell them how exciting this great new product is and how they should all be flying in to invest in competitors and build complements and supercharge the entire ecosystem. Like when they released ChatGPT, this led to a wave of investment in artificial intelligence, the wave of excitement in artificial intelligence. They very much was accelerationist in the situation. And if you think that accelerating that development is bad, then you should be sad about it. However, if you think that releasing this early version of computer use is not going to do that, and that computer use was coming anyway within a year from someone else, regardless, yeah, and if one does work the kinks out in a relatively safe environment, You know, this lets Anthropic both its relative position. This lets us diagnose various issues. This lets policymakers understand. It's already a very good illustration for policymakers of the future to come. I think it is very, very good at being an aid to explain to say someone in the Trump administration, if he was taking a conversation. Oh, they can just control your computer. continuously as an agent, if you just type three lines into this cloud product. And it's not good enough right now that anybody would dare use it, but that's just a matter of a few months, right? It's very different from a year from now, people are going to be doing this, but nobody's doing it yet. But now you have a demo. You can see someone order their DoorDash. You can see someone operate Magic Arena. You can see somebody do these very slow recap. So if you do that, what else can you do? So I think it's the right decision in this context. given that this was obviously pretty close to happening anyway. And you just got to be like, the iterative, but Brune had the statement, I think it was a week or two ago, of a total opening and cultural victory. Everybody's doing iterative deployment. And my response to that was, well, yeah, of course, because you're doing it. Couldn't have a choice, right? It's like, well, total American cultural victory. Everybody has nuclear weapons. Yeah, because you have them, you asshole. Like, what are they going to do, not build them? It's the cost of iterative deployment. is that everybody is aware of the situation and aware of what can be done. The benefit is also the same thing. But the cost is the supercharging of all of the events, the pushing forward. Once you've paid that cost, not releasing iteratively doesn't buy you anything, right? But having someone else iteratively deploying and you're not iteratively deploying doesn't really buy you anything either. And by the way, focus me, you know, SSI, right? Like Ilya is like, I don't have to worry about a commercial product. Let me focus. But if Ilya were to release foundation models along the way, that wouldn't, I think, be that likely to accelerate other AI progress unless he was substantially beyond the frontier. Right? Getting, we got Gronk. So what? It didn't do anything. By getting another open source model that's similar to other open source models, getting another commercial foundation model that's close to other commercial foundation models, these things don't change the pace of other developments much at this point. The ship has already sailed. Now, releasing something that's a substantial leap above the competition the way that GPT-4 was a substantial leap above the competition when it was released, that could be different. right? But like the iterative stuff, right? I mean, Sonnet 3.5 is like an interesting possible exception just because it's helping people code faster.
Nathan: I look back on the whole GPT-4 thing and I still am somewhat confused. I mean, at the time, the analysis as I understood it and the reason for such intensive secrecy around it was, you know, probably somewhat competitive, but also somewhat we don't want to alert the world and have people piling in and it's better than a A few one or a few responsible actors have the lead. And now I feel like the live players list hasn't. Yeah, I would put xAI in there still, just cause if you can wire up a hundred thousand H100s in record time, I think you gotta be on the list. Um, and having like functionally zero cost of capital as a Musk business always seems to have is like definitely a notable advantage in this game. But we haven't really seen any new live players added, aside from them, to the watch list since GPT-4. We've seen a bunch of VC money lit on fire with sort of second-tier foundation model developers, which are now largely getting, like, creatively acquired back into big tech, whether that's Adept or Character or... Pi, whatever the company's name officially was with Pi. Yes. Thank you. And those were all like, I don't know about Adept as much, but both, Pi had a great model. And I think broadly speaking, a very competitive product Character has maybe the most adoption for better and definitely at times for worse of any product. Like those seem to be on the path to being viable businesses if you didn't think that you just needed unbelievable amounts of capital or like access to some data reservoir that you just can't go get on your own. So I do wonder to what degree the whole thing was baked in even then and you were always going to have five to seven mega company or alliances or whatever live players and everybody else was just never going to be able to get into the game. Do you see that differently or do you see any path for anybody new to get into the game at this point outside of like five to seven is so different from one or two right like so zero is very different from one obviously one is very different from two two is very different from three and then three and four are not that different but they still are substantially different and then five six and seven are very similar number right from the perspective of the situation is the way to think about it.
Zvi: And I would say that, like, OpenAI was trying to retain their lead, retain their position for as long as possible, among other motivations. I think they were legitimately concerned with the safety of the product, just to their credit. There's a lot of benefits to at least from their perspective, and they didn't do that. But to the extent that they had a commercial reason to try and stay in the lead, like, they're really trying to make the number one. They're really trying to get a large lead Or at least two or, and most two or three, so that agreements can be reasonably reached. So that dynamics of I'm going to do this, so you do that, and then I can do this in response to that, then you can do that in response to this, and so on. Hold. Whereas if a group of five to seven, like all the game theory says, what'd you get to seven? Right, like try to get cooperation in stack hearts with seven players in the long run is a very, very hopeless situation most of the time. It can be done, but it's very difficult. Whereas getting Staticons to work up to is remarkably easy. So the number of players matters quite a lot. And I do think it's basically the structure of the necessary capital requirements and data requirements and the way that capitalism works right now, that there isn't room for that many players. So we're not going to get 20, 30, 50 players that are making like meaningfully advancing frontier models. People will be either trading models with cheap or using someone else's model or easily adapt and smoke and model or otherwise like not sweating it too much. Because again, like why are you duplicating all the expensive work? It's a waste of time. You're not going to get a lead. You should work with what is available to you for free or damn cheap because the marginal cost of providing it to you is zero or damn near zero. Of course that makes sense. But We're looking at a future where there are going to be a handful of companies that are out front of the rest. And assuming that spending more money and building these bigger data centers and running these bigger things is in fact a way to stay at an advantage, which we can't assume, but is definitely the way that they're betting and seems likely. Then, yeah, we're going to be probably, it'd be three to five or maybe even one to five players. We've seen scenario planning, like Daniel Cucataggio's scenario planning, when he put out what 2026 looks like at Lesseron and stuff like that, where in his models, like OpenAI has this nine-month fleet. And as nine months becomes a longer and longer period as things speed up, every other lab on the planet becomes irrelevant. DeepMind just folds in those scenarios at some point because there's no point in trying to be nine months behind. Because so what, if you catch up and you're six months behind a year from now, that is now three cycles. And then a year from then, it's going to be infinite cycles. Right, in that kind of scenario. So like, what's the point? So if you can get above of a lead at the right time when things start to go vertical in terms of pace, then you have a decisive revenge over everybody else. And right now, yeah, I think that what we're realizing is There really isn't room for more than a handful of companies to be worth competing in this space. And the only reason Meta is able to compete in this space is because of Meta's business interests in Facebook and Instagram, creating weird incentives for them to then do other things. And also giving them this giant pool of money they can just set on fire whenever they feel like it.
Nathan: Yeah, having tens of billions of cash on hand with no better way to spend it is handy to play, as it turns out.
Zvi: But also in this situation where like even a slightly better way to serve people like addictive feeds or better advertising just makes them billions and billions of dollars, right? So like they need very little effective improvement from doing the work themselves. Or like very even buying insurance against their work being cut off by other people to make it worth for them pursuing this very aggressively. So they have a lot of odds, even if other people in this situation wouldn't. And they have these unique data sets because of this association. So they're in a deep position to be competitive. And then Elon, of course, is Elon and just whatever he wants to do. So he gets to redirect all this capital. He gets to use his leverage to recruit a bunch of people and redirect a bunch of stuff and compete. But it's not obvious to me that xAI is relevant. Like they might be relevant. We'll find out. But so far, they haven't actually done anything that changed anything. But yeah, my guess. And then still probably largely coming down to the big three plus one. Right. And then we'll see if they can all stay relevant. Increasingly DeepMind and Google. Don't seem that relevant in an important sense. Like they're not that far behind, but why do we care that they're there almost like it's starting to feel that way.
Nathan: Yeah. It's funny. I just had a couple of Googlers on the show when they did a, an update where they added Google search grounding as a feature to the Gemini API. And I asked them this question basically saying like, why is Gemini number three? Because I don't, I mean, you might feel like the answer is obvious. I don't feel like the answer is super obvious. When I look at benchmark performance, when I look at cost, when I look at just like APIs and rate limits and just all the things, it's pretty competitive. And yet it was the third provider that I integrated into the app that I'm building. And it's definitely the third window that I open. Typically Claude is first, then ChatGPT, and then Gemini. If I want to do a contract review, I'll do all three for completeness. But it is always the third. Is that a branding problem? Why do you think it is the third? Because I can't quite put my finger on it.
Zvi: So I am using Gemini for my Chrome extension as the thing that I'm building with the code base. Part of that is because Claude just kept being annoying and giving me a 401 and Claude was in fact the model I was using to fix the 401 and it couldn't fix the problem. And I was just like, well, if you can't fix your own problem, well, Gemini is a lot cheaper and it's probably fine not just use Gemini. But, and Gemini has been working okay. But it's been annoying. It will follow instructions. It has issues. But my lived experience is that it's worse and it doesn't have very useful places where it's better. So when I type into the Gemini advanced window, I don't get much out of it. And then I just stop doing it. I'm just not interested in this anymore. And there was a period where the long hand-neck window was unique, but now it's not unique anymore. There was a period where they jumped briefly ahead of other people in various ways. And I was like excited to use Gemini, but it just hasn't happened for a while. And I think part of it's a marketing problem. I think Google has been insanely bad in marketing Gemini. I hate splitting Gemini. That's also the implementation sucks. I bought a $1,700. That's a fault. Partly because even a marginal improvement in efficiency is worth a lot of money. So why not try out the best? Partly because I was told there'd be these cool Gemini AI features. I didn't use those features. Some of them I haven't even get to work. You're not circling to search on a regular basis. It has never come up. I've never wanted to search anything. Not once. Never been done. It's possible that it just didn't occur to me and I had a great opportunity to circle the search, but like I wanted my calls to have automated login. They don't. I wanted to be able to, like, talk to the AI and have it usefully use in my phone. It didn't work. Like, every time I tried to get the AI to do something useful for me, I just gave up. And, like, I'm debating getting an iPhone. Like, there's a Verizon special where you can supposedly turn in any phone in any condition and get a free iPhone on them. I haven't used my contract in a while. I was like, why don't I get an iPhone, too? I've got a drawer full of old phones here I could potentially cash in. Yeah, I can only do it once. But I've got at least one non-working phone that I can cash in. Or not even buy one, I'm sure, at $10. Like, it's not hard. But it's just, they haven't done any. They haven't made it useful. They keep telling me, Gemini, now available on this app that I'm using. I use a lot of Google products, and they're great. I have no, I love Google. But, like, I never use it. Like, the few times I've tried to use it, like, I remember they put Gemini on Google Docs. And I just went to the end of the document. And I was like, Gemini, please summarize this document. Didn't do it. Can't help me.
Nathan: Yeah, the integrations have been disappointing for sure. To the point where I did a project to go back and... using APIs that they have it's funny because they have such incredible APIs and then they don't seem to use them in their own product development and that's quite puzzling Google is this like amalgamation of like different departments they basically don't talk to each other and basically don't copy each other's work they have to redo they have to reintegrate and re-implement everything and they don't cooperate for it well and like they're doing very split in front all the time And yeah, as a result, like you can readily get good things in places, but often you just don't.
Zvi: And it's a serious problem. So I don't know what's happening with that. Like notebook.lm for a while, I was trying to use it. They just vetted all my articles and tried to ask it questions. It's just not good enough, right? It's not, it's failing on its base too often. I don't feel emotionally like I want to use this or I want to try using this anymore. So I stopped. Like the one cool feature they implemented that no one else has right now is like the podcast or the AI podcast. And some people are like, I find them and listen to me terrible. I try to listen after a minute or two. I'm just like, I can't take this anymore.
Nathan: I like them actually. Although I do think there is a step up there relative to any other automatically generated audio I've heard. It's amazing technical achievement. I, what I want, and I'm sure they will have it before too long, or at least I expect they will. I'll, I'll certainly be asking questions about this is a, a much more literal version. Like I was trying it with biology papers and talk about a world of just infinite complexity and whatever that I just started to acclimate myself to, but I don't want the. very like poppy explanation of oh this the way that this antibody attaches to this thing is it's like having the perfect key for the perfect lock that only fits that one lock and i'm like that's not how i want to have things explained to me i want to be much more literal and much more grounded and i haven't been able to prompt my way out of the like very superficial analogy driven approach that it kind of defaults to. But I do think if they, I do like the two voices. I do like how they kind of keep it engaging that way. And I definitely prefer it. I'm very much an audio person relative to reading. I know we're not the same on that. I really like the two voices.
Zvi: I both link both of the voices. I think they're very well done. I like the way they dynamically play against each other, but the information density and the amount of banality just drives me insane. And just in general, there's no flexibility in the podcast. The great advantage of AI is that it tunes. So one of the things I've gotten to experiment with recently that I've had the time for is some forms of AI storytelling, including various NSFW versions of it. I've just been toying with them. And it is amazing how much something that's relatively bad objectively terrible and objectively just got an awful terrible is so much better when you have this ability to steer we have this ability to influence what's happening selectively when you want to while simultaneously mostly letting it write its own story or giving it a little bit of direction and that's turning something that would be unreadable if you were directing with completely awful unreadable complete drag into something that's exciting, right? Because you feel like you have agency and you can emphasize the things that you find cool and exciting and the things you find bad. And a podcast is the opposite of that. I want the podcast to be this thing where I'm constantly giving you direction, right? Not like it pre-generates one podcast and it just presents the information in this incredibly undense format. What I love about AI is in learning things. Is give me the answer. Okay. Now I'll drill down into the places that I'm actually curious or don't understand. Or I didn't, your explanation wasn't what I needed. And I can correct you when I consider you. And this is just not that.
Nathan: Yeah. It's interesting as a, as somebody who has spent time developing AI powered products, I think that balance is a tricky one to strike, especially if you're trying to serve a broad base of users that are not AI obsessives and don't have great intuitions at this point for how to steer, how to prompt, whatever. I think there is something definitely to be said for the like end to end moonshot. Like we've definitely seen that at Waymark relative to other AI video products where There's many products out there where you can generate an image to put into this scene or write me a bit of copy to drop in. But very few take the what do you want? We're going to make the entire thing for you, present it to you, and then let you kind of go from there. And they are currently missing the go from there piece. But I do think too many products and especially too many Google products have taken the other side where they're like, keep it super, super modular, super bite-sized, user fully in control. And that also leaves quite a bit of magic on the table. So I do like how this is a different approach. I do think they'll circle back. I've heard them, they were on the Latent Faith podcast not long ago, which... included a little discussion of adding real-time audio as one, one item on their roadmap. And that would be a qualitatively different experience, obviously. So. Yeah.
Zvi: I mean, the world in which like you can be a third person on that podcast and they'll adjust to what it's saying. I thought that product is so much better. The world in which like it can just go far faster and be more dense and like not feel like they have to set all this background and talk to us, be so slow. Like, yeah, you can go 1.5X or 2X or whatever, but it doesn't like fundamentally change the problem.
Nathan: What has your experience been with real-time voice or advanced voice mode, I guess is the proper term from OpenAI. You know, we were racing to her however many months ago. Now her is here. And I just felt that into her.
Zvi: It's like my actual response, right? Like, she's nice. Yeah, I appreciate her for trying. But yeah, I'm just not into her yet. But she needs some more seasoning. She needs to get better. Part of it is you can't prop properly. You can't get it to do the thing you want. So now it has to be, like, good enough that it can intuit from you just saying random words, responses that are good enough. And my experience is just, no, I meant this other thing. No, no, no, I gave up. No, it's not what I'm asking you to do. It's not the information I want. Like it's, there are occasionally magical moments where you feel like you're having a conversation, right? And it's just like flows really cool. But for the most part, I just wasn't my experience at all. It was like very, very bad at intuiting what I actually cared about or wanted it to do. And I don't really blame it because again, like my prompting was terrible, right? It was just me speaking words in no particular order that had to be transcribed without any kind of plot. Like people who want to be talking instead of typing to a computer in general, I'm just so confused, right? Like I can't talk as fast as I type. I can't talk as precise as I type. I can't talk as deliberately as I type and I can't correct it. What am I doing? I mean, sure, if I'm walking around and there's a certain modality and a certain like different aesthetic to it it could be an advantage but yeah i just stop talking to your phone i just stop talking to your phone for anything but the most very specific thing right i have a i have an Alexa right and it's very good for a very small number of very specific things where i know exactly what to say to get exactly that result but say you hooked a ChatGPT which is like something that's obviously common in some form, whether or not it's exactly . And now, if you give it the specific command, but if you give it any other command, it will intelligently interpret your statement. Well, my assumption is this will basically greatly expand the sense of things that will work. I'll be able to find other things and do the things I want it to do, but I still won't really want to do open-ended requests. that are complex, that are interesting until the technology gets a lot better. All right. And then GPT-5 can handle that.
Nathan: I'll take the other side of that a little bit in as much as I do have a lot of excitement for a future where I can learn while taking walks. Like I'm just plain too sedentary. And a big part of that has been that I'm unable to learn effectively on the go. I do think the voice mode will work for that. The biggest, and then that was so disappointing was I couldn't paste in the contents of a paper that I wanted to learn about. For whatever reason, you can't attach a PDF to it or even paste in a decent amount of content. And I was like, why? Why can't I? I just want to talk to you about this. I know you can do that. But for whatever reason, that is not supported. That feels like the kind of thing that, you know, is an obvious candidate for unhobbling. And I still have a lot of hope for it.
Zvi: I'm not saying that you can't get there. I don't think you couldn't get there with, like, all of these things are just like, you need a certain specific set of hot ones or features that just aren't too necessary or too vital to the things you want to do to figure out enough of that stuff when you have something. So, like, say I'm listening to a podcast on some app and I have Gemini always listening to what's happening. And then I can interrupt the podcast to ask Gemini about what I've been listening to, right? And it responded to tell us, wait, wait, what did that mean? What is he referring to there? You know, basic question. That sounds really helpful, right? And then I can interrupt, like, well, what if we didn't do that? But it has to plug, right? It has to, like, just work. If it just work, I'm not going to do it. I'd much rather just wait till I get home, get in front of the computer, paste the transcript into, you know, if I have to, into Claude. It's just my thing. But like, you're going to do the simple sort of stripped down, simple, true her version. Like, it just has to be good enough. And I just don't sense that we're caught.
Nathan: Yeah, I would get a lot if they just allowed me to paste in a paper or two. But yeah, I'm not sure why that hasn't happened yet. Let's zoom out again a little bit from the products from these live players. And I wanted to get your take on... the increasingly tangled web of alliances between these companies. If you had told me even a year ago that OpenAI would manage to add an Apple partnership to their existing Microsoft partnership, that would have blown my mind. If you told me that, I wouldn't have been so surprised that Microsoft would turn around and have multiple providers in GitHub Copilot, which they now do. That one of them is Google is a pretty remarkable fact, I would say, unto itself. Um, You know, in general, there's this like interdependence coupling, if you will, going on between big tech providers. Last time we talked a little bit about like, are the models converging or are they diverging? And is that good or bad? My general sense is that this coupling is probably good. It seems that it makes the overall competition a little less intense. And that they're all frenemies in some way is like. better than if they're all siloed standalone platforms that don't talk to each other from just the standpoint and probably just straight up consumer surplus now but also a like when you talk about like how many companies how many players can you have in a game theoretic sense like work together nicely this would seem to support that or at least be better than many alternatives if you want to see i haven't really thought about it too much but my natural instinct is something like Perfect competition is the worst case scenario in terms of giving people flexibility in Slack to do the right thing when the right thing is not as competitive and to have given choices.
Zvi: So to the extent that what's going on is that everybody is willing to work, every deployer, right? Like Apple is willing to work with every supplier, every developer. Then the developers competing against each other in a way that like everyone who's using software can swap Gemini out and Claude in or vice versa in five minutes, maybe in 30 seconds, whenever they feel like it, or just click a button, they might move on call. That places tremendous pressure. Whoever is marginally better for each given task gets to the business. Whereas if Amazon is committed to in some alternate world and Apple is committed to OpenAI and Microsoft is committed to OpenAI, but Google is committed to Google and blah, blah, blah. Then, you know, just being slightly better doesn't matter very much like Apple Intelligence, right? Like it's going to be succeeding or failing based on like the UI and what features they decide to implement and whether or not it basically works and just works in a way that Apple products trying to just work. It's not going to succeed or fail. based on whether or not it's 10% better in its core AI function. It just has to be good enough. And whether or not it's slightly ahead or behind what Google is doing with its Google Assistant in terms of the actual model, it doesn't matter very much. But if they're competing, if all the models are competing to be the Google Assistant, all the models are competing to be the Apple Intelligence. Now the tiny difference is everything. And now you just have to go full speed ahead trying to maintain this narrowly or this narrow functionality into everything as fast as possible.
Nathan: I feel like we flipped because last time I was saying that about just the model developers and you'll have different friends for different things. But now, if I understand you correctly, you're taking the other side where you're saying this, the ready availability of an alternate model in any given product, you think effectively makes the competition more intense at the model.
Zvi: What we've seen a lot in time is basically this idea of a new model will come in and people will say, like, well, we're swapping in that new model for these first cases. And like, a natural result is that a lot of people would tune their systems to use a specific model. And they know exactly how to make that help, see what they want it to do. And they're used to what that model can do when they put in that model. It's not just that model has some features and ways that it's strong and other models have ways that it's strong. The more similar they are, the more you keep swapping them back and forth, right? The more they're close to each other in that way, the more distinct they are The more you have lock-in, and the lock-in is probably good, right? I do want these silos to a large extent because, yeah, I think that actually puts a lot less pressure on the labs to push forward in an important way if it oversplits another business immediately. There are big switching costs in that world.
Nathan: So you're also then happy to see the apparent divergence in model capabilities. Like OpenAI has reasoning and Claude has computer use and is the best decoding and Gemini has long context and Google grounding. And you want to see more Cambrian divergence and less copying. Yeah, I mean, different features is great.
Zvi: What do we mean by I'm actually just learning about the Google integrations? of Gemini, right? Like, to what extent is it integrated? What does that mean? Right? How are they different from this web search that can be done on ChatGPT? But yeah, like the fact that like, when I want to search the web for a piece of information right now, I only use Google, the certain type of information that I expect just Google to pop out. I'll search ChatGPT, but I expect WebGPT search to search it out better, right? Everybody needs perplexing me for a third type of thing where I want like a set of sources type of explanation. If I want, like, coding or other types of natural chats or analysis of the paper or something, I'm going to use Claude. If I want, like, a very specific set of things that might use 01 and so on, I think that's good. Right? Now, on the margins, they're trying to carve out their own space. And just pushing the model's core capabilities is less important. Because I don't care which model is 10% better under the hood. I care about which one is more adapted to the use that I'm doing. When I'm a purser. Okay, I will think a substantially worse coding model that's integrated in Cursor over a better one that isn't integrated by a lot. My coding speed more than double when I switched from using the Claude chat interface to using the same exact model in Cursor.
Nathan: Yeah, Kirsten's awesome. I definitely really love it. I think GitHub Copilot has also caught up in a substantial way with their last round of updates. But as far as I understand, Kirsten is still the best. I'm saying those are the two things that I've done so far in the night and day. I think it still is the best. My understanding is it still is the best. But yeah, the water line is always rising. Okay, so before we move on to some other kind of big picture things, talk a little bit about regulation, what the government is or isn't going to do. Handicap AGI for us real quick. Metaculus says 2027 for the weak AGI median forecast. The modal, you know, the top of the probability curve is late 2026. I have to say, I buy it. It doesn't feel like there's that many more turns, the computer use. If it's like, I look at Sweebench and it's, geez, we're going from five to 50% Sweebench in a year. So for everyone's benefit, what is their definition of weak AGI that they're using? Let me look it up real quick. So it's four things. This first one is tricky because I dug into it and my guess is that we have a definition problem here, but the first one is able to reliably pass a Turing test of the type that would win the Loebner silver prize. Next one is 90% or more on a robust version of the Winograd schema challenge. That's basically like pro pronoun and sort of referent disambiguation. I would say that we're well clear of that already. Right. Next is be able to score in the 75th percentile on the math sections of the SAT, given just the images of the exam pages as input. I think we probably are safely past that as well. When I tested this, it was not yet in multimodal, but before early was, you know, getting better than 75%. Yeah. Well, that'd be a limiting factor on this problem.
Zvi: We can move on.
Nathan: Yeah. The next one was the final one is be able to learn the classic game Montezuma's Revenge and explore all 24 rooms based on the equivalent of less than a hundred hours of real time play. And that is intended to be a unified system. I haven't tried Claude on Montezuma's Revenge, but I have to imagine general computer use, a haiku version that's fast enough to respond would probably do that. Although I don't know how much strange or counterintuitive, whatever you'd have to do. Right.
Zvi: In anything, given this definition, I would bet the under what I would not consent it back to constitute AGI in the sense that I care about whether AGI shows up.
Nathan: Yeah. The reason I think this question is ultimately going to age poorly is that this Turing test silver prize. And that's the real question, because we'll get London Jimbo's revenge in 285, probably it's my guess. Yeah. It seems like Turing test is formulated in a way that basically has expert users. And I think that the real problem is that the AIs are not trying to pass the Turing test. Like they are If you wanted to create an AI that would pass the Turing test, you would do what OpenAI warns against in their fine-tuning documentation, where they show an example of, look what happened when we trained a model on 100,000 Slack messages. We asked the AI to do something and it said, sure, I'll do that tomorrow. And that has become their sort of funny anti-pattern. You don't want to really train on real data in many cases because you want a different behavior from the AI than you actually observe in the wild.
Zvi: My sense is that the way to tell the ensuring test right now is ways in which the AI is just better. Like humans do something stupid that you don't want the AI to do. The AI doesn't do it.
Nathan: But if you wanted to... It just made it say, I don't know a lot. Like you could pass as human a lot more easily. There are various things you could do to make the AI pass.
Zvi: And if you really wanted to pass, if you offered me a $100 million prize for patenting this by the end of 2025, everything listed here, I'd consider you to be a huge favor as you succeed. It's just not a very hard task.
Nathan: So maybe, okay. Scrap that. A probably just flawed definition. And the reality is, barring a prize, and I don't think there's any cash attached to this for anyone, there's not really a reason to try to pass the Turing test, other than like general scams, maybe. No like major developer is going to do it.
Zvi: What we're saying is actually that like, this question comes down to when they choose to pass, not when they can't. Right. I can turn to the grading. So it's not a really interesting question.
Nathan: So let's look over then to the Dario definition. Because he, this is another thing I wanted to talk about. And I am of multiple minds on the machines of loving grace vision. Right. First 50%, I give very high marks to. This is the part where he defines, and he doesn't like the term AGI. Just like we're all EA adjacent. We're all, nobody likes the term AGI anymore. We're all using substitutes. Like in his case, I think powerful AI is the phrase. Yeah. But I mean, he lays out a pretty high level of capability that is basically everything but a robotic embodiment that you would want. It's like smarter, as smarter than a human Nobel Prize winner across basically all the disciplines, has the same affordance, has multimodality, has the ability to... use computers effectively, you know, can potentially take offline actions as well. Although it doesn't have a robot body. There's a lot of APIs. There's a lot of ways to place orders or hire people to do stuff online or whatever. So unlimited computer use. And this then translates to a pretty exciting vision. If you buy it, that we're going to have potentially a century's worth of biological discoveries and medical advances. condensed into say a five to 10 year timeframe. I do find his rationale there pretty compelling. Basically, he says that if you look at the history of biomedicine, It's a relatively small number of really important discoveries that drive most of the progress. And the form of those discoveries is some programmable or semi-programmable tool that kind of becomes a platform technology that you can then do tons of stuff on. So CRISPR is like a canonical one where, hey, now we can edit DNA to a certain level of precision that we couldn't before. Now we can do all this more stuff. It drives all kinds of downstream discoveries.
Zvi: This all reminded me a lot of the Star Trek situation in terms of you have this holodeck, right? And technically speaking, you could do literally anything if you want to and just do what he does. And then you ask yourself, what would be the consequence if people use it in exactly these ways and no other ways? And then you have really interesting stories. But of course, there's a giant plot hole. There is nothing stopping them from using it in other ways. And they often accidentally sort of use it in these other ways. And then they just pull back or don't realize they can keep going. And Machines of Love and Grace is essentially portraying a world in which you have the AGI that should ASI very quickly. And instead, we leave society mostly alone. We don't develop an ASI. We only apply it one at a time to these specific areas in an isolated, directed way. And then we see what is possible doing that. And I think he does a very good job of exploring what that would look like. But that scenario is fucking bizarre, right? It's the question of, okay, what medical advances could we get if we had infinite advanced PhD students just riding around at set up a mini X speed available in the cheat to do anything you want? Well, yeah, of course you're gonna get a lot of advances, but are you pairing the lead at what happens in this world when that kind of thing is doable? Like what else is going on? And on so many levels, including like, how did we decide not to develop better AIs? with this tool that you just created, which obviously could do that very quickly. Like, you're not getting only five years for 100 years of biology-like advanced. You're getting the same thing for AI advanced. Only more so. Because AI is going to be bottlenecked by all these physical experiments you have to run. And biology is. So, there's no real explanation there. It's, like, very weird sci-fi. But it is a good counterweight to people who say that wouldn't matter. That you wouldn't get cool things. Like, yeah, no, you get these cool things. But it's a bizarre scenario. And that's, like... My view, again, of, yeah, where we were up until the point where you know that you have a problem, right? Which is the part where he calls for this, like, democratic values thing. This alliance of the good guy against the bad guy that we also think nobody's rhetoric.
Nathan: Yeah, that's in the second half. And that's the part that I don't like so much. Before we go to that, he seems to think it's like, of course, it's a well-caveated essay, but he seems to think it's more likely than not that we get to this general level of capability in a 2026-27 timeframe. I've heard other credible people say, yeah, that seems reasonable. Maybe just am not a delay, make it 2028. Everything doesn't go quite as smoothly as he thinks, but basically it's right.
Zvi: This morning, I was listening to the podcast. I haven't finished it yet, but it was very long. But the podcast he did with Lex. And like he emphasizes that he's saying that that's how fast it would be if we don't get one of the following roadblocks. So it's like a conditional prediction for him specifically. And I understand it. then that becomes the median slash modal prediction. But there are only ways that can go wrong, not ways that can go more right, sort of as conditions on that. So he's not thinking that as a, like his prediction is somewhat less aggressive than that in effect. But he is saying a large probability weight on 2027-style numbers. And yeah, that's the message coming out of all the major labs. That is what these people actually believe.
Nathan: So do you basically adopt their statements as your beliefs, or you are, at the top, it sounded like you were more skeptical than... That's why I said that's their belief, right?
Zvi: Not that it's what's going to happen. I think that I have monoline certainty. So I have their perspective, and they have some unique information, but they also have some weird incentives and groupthink, information cascade, potential things going on. And there are other groups that, like, have very different estimates. So I would say if 2026, 2027 involves the relevant kind of AGI, that thing is very possible, right? Don't we need a chance of that happening for sure at this point, given what we know. But if I would bet abstract utils in some way at 50-50 odds, I would bet against 2027. I think the odds are under 50%. But that's because I don't see what they see. And I don't know what they know. And I'm not familiar with their environment. And I can't just take their word for it and discount everybody else's perspective. And also, I do think they're making those predictions on the assumption that a variety of things don't happen. And Dario was more explicit about this than most. But I think they're all making the assumption, essentially. And so what happens if China-based Taiwan in 26? Yeah, right.
Nathan: Yeah, which is a perfect segue to the second half of the essay, which notably did not come up in the Lex Fridman discussion at all. And Dario, if you're listening, I'd love to hash this out in a little bit greater depth, which is to say at least somewhat compared to that. The short version is everybody seems to be, and by everybody, I mean Altman and Dario apparently, and like the national security establishment blob are all on this path that we're going to, notably under a you-know-who second term, as it turns out, going to keep building the AI, jam, accelerate, build the best stuff we can, keep China behind by cutting them off of chips and maybe other measures, but certainly the chip ban is the big... thing so far, then we'll achieve maybe 2026 timeframe, decisive military strategic advantage. And then we'll create the new American empire same as the old American empire, but with a hundred million times more AI where we get to go to everybody and say, we're the good guys join with us. You don't get to have your own AI, but you get to make the API calls or whatever. And China's isolated and Russia too. And then eventually we get around to them and it's like, you guys are the last ones out, but we'll invite you in. You just have to play by our rules and then you can have all the great AI benefits that we're already enjoying. Sounds crazy to me. What do you think?
Zvi: Well, one thing to note is that a lot of people were talking about building AI with democratic values, right? We're like treating the, anything we'll use democracy as good guys and it's not democracy is bad guys. And we need to understand that democracy is a semantic stop sign. Democracy is a word that says stop thinking, right? Like we are vibing this with good and therefore if it's democratic, it's good. And you can stop worrying about what happens after that. You don't have to figure out what that concretely means or how you would implement that. You know, in a world with like super intelligent entities that can be copied, that they are, like, do they vote? Do they not vote? How do they vote? Like, what does this mean, democracy? Is the Global South voting? Do they get equal votes? Have you considered what they actually want if you let them vote? Are the voters going to understand the implications of their vote when it comes to AI policies? Like, none of this much can be said, right? Nobody has sought this threshold. They're just evoking the word democracy to mean good. And similarly, we are talking about a line with democracy because democracy must be. I love democracy, but we have to understand what democracy is for and why it's here and why it is better than the alternative systems we have and not just reduce the stop line and not use any excuse to have to figure out what we actually want the future to look like and, like, how to preserve human values and find things to care about, right? Like, we can't just promise to think about later and all we find. It's not that simple. Like, that's not operational. That's one problem. The other problem is, yeah, if you're very loud about the fact that you're doing this, like what is the only reasonable strategic reaction by the other side? And why should you like to work? But you have to keep in mind all the options are bad. You're saying we're not going to try and cut off China or anyone else's ability to develop AI because that would antagonize them. So we should be cooperative with them and let them be equally advanced or more advanced than us. And then, of course, everything will somehow just work out. It doesn't work. You can't just ask them to become voluntary secondary partners in an alliance where we have permanent superiority because it seems unlikely they'll buy it. You can't really offer to give up your advantage and go in as equals because the U.S. government isn't capable of doing that, even if it was somehow wise, et cetera, et cetera. All the options are bad, right? Like you can't try to get everybody shut down AI until you have some argument that can carry the day, at least amongst the U.S. government and then everywhere else as well that matters such that you could make that agreement. Like all our options are bad. And, you know, some form of get the people who can get on board and then work from there could easily be the least bad alternative at our disposal for now. But the part that I hate so much is the whole like, well, you just treat China and also Russia and also like anybody else who you don't like as an intractable enemy. who you can't talk to, you can't negotiate, you can't, like, reach agreements with, and you can't get to treat this safely. And so you end up just having to smash the accelerator and pray that, like, physics is kind to you in ways that I don't think is kind. And that, like, definitely are not that likely to be kind. Like, it's definitely far from certain. That is awful. But, like, yeah, you have to pick up the phone and you have to talk to these people. Like, we've seen evidence time after time that, you know, China hears a lot about AI safety in at least some sense. Right. And they don't just want to rush really nearly as fast as possible. I hope that everything works out. They are very, very committed to the idea of maintaining control in every sense over China. And that should be something to work with. And they're humans. We're not humans. We all want good things. Like this saddens me and it scares me that everyone's pushing in this way. And it's like a huge obvious failure mode. And if you're too loud about it before you're too loud about it, especially before you can actually enforce it, right? Like, why is TSMC still there? In an important sense, China doesn't use any of their checks, right? Like, if they believed what you believe when you say these things, would it still be standing? Or would they bomb the hell out of it even if they don't need Taiwan?
Nathan: Yeah, that seems... I don't want to bet on that because that just seems like somehow icky at a minimum.
Zvi: The answer is because nobody believes it in their gut at the higher levels as confident enough to do something that would rough the global economy and threaten the stability of their regime. And they're like, you know, we should be thankful that that particular move is off the table for all practical purposes. But there are other moves that are not. And as things go farther down, things are going to look less and less unthinkable. And we're going to start to be put on the table by various actors, especially if we are loud about enough that we are doing a strategy like this. which was an aim of world domination, permanent scientific advantage. I'm not saying you shouldn't seek it out and say, understand what you are doing. And also consider the alternatives. Again, all of the choices are bad, but yeah, it kind of sucks.
Nathan: Yeah, it does not seem like we're headed in a good direction there. Obviously we have, and I can't, This has all been filtered for me through the news. I don't speak Chinese, so there's plenty of lossy translation. But my understanding is that we have Xi saying 2027 is the deadline for the army to be ready to do something about Taiwan, whether that's a full-on invasion or whatever. It seems almost impossible to defend Taiwan. TSMC fabs from a Chinese drone swarm if they're, like, really determined to knock fabs offline.
Zvi: I mean, quite obviously, if China wants to knock TSMC out of existence, it can. Presumably, if China wanted to knock American chip production out through sabotage, it could. You know, at the risk of, you know, it's just very hard to put defense in these situations if you care enough. But luckily, we don't live in those worlds yet. I mean, obviously, you know, if I was the U.S. government, I would either deliberately have a goal of U.S. chip independence 2027, or I would be pretty damn committed to defending that word.
Nathan: Now you're hearing, too, that Taiwan is saying increasingly, and I was wondering about this, I was like, why are they so happy to send all the chips here? Like, this seems like a very... Why are they shutting down the nuclear power plants? I mean, the Taiwanese are not necessarily acting strategically or Russian. They have said more recently, at least I, again, I'm filtering all this through media, but my understanding now is that there are new statements coming out that are like, we prohibit the export of the two nanometer process. And there is going to be some most advanced technology that will still be here and will not be going to the United States. I don't have a super clear sense of how much that really matters. I mean, there's so much obsession on chips and five, four, three, two nanometers, whatever. Like, it seems like, does that exactly matter? Or is it just having the ability to make a lot of them, whether it's three or two? Like, it doesn't seem like that. We honestly don't know. I just, after someone who knows, I just don't know. It seems more likely to me that raw volume will be more important than one process versus the next generation of process. Like I would take 10, three nanometer fabs over one, two nanometer fab for sure.
Zvi: Only because... It's a technical question about how they impact the use cases that you want. And to what extent is better efficiency of individual chips scale better than adding more chips? Questions I just don't know the answer to. Like, you know...
Nathan: This is another question I don't think anybody has the answer to, but the possibility of distributed training schemes popping up is another area that I am trying to watch as closely as I can and feel like is one of the best candidates for a big shake of the snow globe where, you know, if you have to build and def not just build, but defend a trillion dollar data center, you have a real problem. Whereas if you can do ASI at home in the way that we used to have SETI at home, then, and maybe still do. I haven't checked in on SETI at home in a while. I'm sure it's still running. Yeah, it probably is. But there have been some early reports recently of distributed trading starting to work. And those have been only like partial disclosures. New research had one. where they were sort of like coy about it and said, we've got the league working. We don't really even quite know how it's working yet or why it's working. But that space definitely could change a lot if it could get you to the point where, you know, in more of like a Bitcoin sort of way, like everybody's GPUs could just be distributed, networked and contributing that creates like a very different dynamic.
Zvi: Yeah, look, I have been skeptical of the quality and efficiency of the security computing proposals for training. And I think that's all left so far from what I can tell, but obviously that could change. And if that happens, that puts it kind of a clock on everything and various support instances. And again, the more things get released, we can't be taken back. And the more these types of things happen, the more all of your options for not dying become where you have to choose impossible choices where like all of your options royally suck and if you propose anything people will just probably murder but like nothing that doesn't require crying a buddy murder in some form is possibly going to work what do you do right like but cross that bridge when we come to it and we hope the physics is kind to us but yeah the more this looks like a thing the more like we just have a lot worse options and a lot worse options. It's unfortunate. That's how it is.
Nathan: Do you update your pDoom on a regular basis? Do you have a ready number for pDoom? I have been using 0.6 for a while.
Zvi: If I was being fully robust, that number would be changing, obviously, on a day-to-day week-to-week month-to-month basis. Certainly, for example, it would have been updated when SB 1047 was vetoed. It would have updated when Trump was elected. Yeah, you can argue whether which direction it should go, but definitely should update. Otherwise, you're just being very silly. Should update when we find out various things right up and down, et cetera, et cetera. Fundamentally speaking, I think my attitude is just, it's not useful to try and get that false percentage or get that delta. It's like the, how old is that dinosaur's tail? It's 75 million and three years old, right? Because three years ago, I asked it was 75 million. But, yeah, you want to keep the Delta accurate for consistency, but you also kind of don't, right? Like, don't like these out, 10, 9%. If you're confident that, like, you have enough model uncertainty and you have enough ways things can go wrong, but you also have enough just general, like, various ways things could plausibly go right, that, like, the sentence you made don't change very much. Like, I don't find it that useful. And I also find that there are a lot of, like, very weird and imprecise things that, like, I would evaluate differently on different days cause me to, like, get to different numbers if I were to actually go through a calculation in my head that was more precise. And yeah, I don't think we should spend too much time on the details of the question. It's basically a question of like, if you do as usual with the 10 supply, okay, are we talking about 1%, 1%, like 2%, 5%, 10%, more than 10, like potentially more than 10, but not 98. And then like really high. And that's mostly what you care about. Certainly within the range of . Yeah, I'm with you on that.
Nathan: So let's go through some of these other updates, because I think we're in the home stretch and there's like a handful of items, most of which you just mentioned that I wanted to get your take on. SB 1047 being vetoed, we both supported it. I certainly, I went through a moment where I wasn't so sure when it seemed like the Frontier Model Forum was going to be like potentially overly empowered. And I did not like the sound of that for a minute, but then the final version seemed quite good to me. I don't know if you ever wavered, but I know you also did support the final version at least. How big of a deal do you think it was that it was vetoed and how, or actually how big of a deal would it have been for it to exist?
Zvi: I think it's, we're going to find out a lot more by February 1. Like by the time that Trump has had his day one and we decide, he tells us what he's going to do with the executive order and other immediate executive orders of his own that he issues on day one. But I think we're seeing that this is a giant disaster. This was a huge clusterfuck. As a result of SB 1047 being vetoed, the bill is not being used as the model for, right now, other legislation in other places, even though it is obviously the best model legislation that we have available. Instead, people are considering bills like the one in Texas, which Steve Ball has written up very well, which use a use case targeting EU style, EUA act style approach to regulation of AI. That approach has all of the costs you can imagine and none of the benefits to existential risk. That approach is a EU disaster. And without a better alternative that is currently viable to talk about, without any sense that action is being taken, that approach is going to become increasingly hard to stop unless we quickly do a turnaround. And it doesn't seem like people like Dean or others who understand how awful this is are willing to get behind an SB 1047 style alternative revival and propose a compromise. They seem to continue to argue, go back to, or I can tell that everything everyone's proposed is always bad. And we can all line up and say the Texas bill is terrible, right? But I don't see how you stop more and worse states from adopting the worst possible kind of AI regulation unless you give them something better. And SB 1047 would have been something better out of California. It would have carried weight. It would have been copied extensively in other jurisdictions. It could have been complemented by narrow task bills and those things like deep fakes that aren't covered by SB 1047. We could have had a net positive, reasonable regulatory regime from the states pending Congress's decision and a much better chance that Congress adopted a civil war policy. Instead, we live in a world in which we're looking at even a deeply red Texas looking to implement a bill that you would think they would recognize was like EU style, complete regulatory, suicidal nonsense. And yet here we are, right? Like Texas should be the last place we should pass. Texas should know better. Would you be like, why is Texas messing with us? Like, it's like their entire stink. They don't think that. And yet here we are. Like, sure, Connecticut passing it makes some sense. I get that. But Texas? And yet, that's the thing. You can't, no regulation was never on the table. Right? Not for very long. This idea that we'll just keep yelling about every bill as if it's a disaster, no matter how well constructed, no matter how light touched, no matter how many the benefits are. Like, we had this debate between, you know, libertarians and nutcase extreme libertarians, let's be honest, as to whether or not this bill was a good enough bill to pass. And now the libertarians and the nutcase libertarians are lining up to oppose the regular people who regulate everything into the ground. Right? But those of us who still want to regulate AI are trying to find this narrow path back to something reasonable. And the people who don't want anything are refusing to accept that their position is not viable. Right? They want our help, frankly. Here's how it feels. It feels like I threaded the blade and had the barbarians at the door. And now you want my help, but you're not offering it. You just want me to stop the barbarians because you're going to kill all of us? And I will. But like, you know, somewhat. But like, this is just completely insane. Right? Like, you have to understand it. But then it comes down to what Congress and the executive government did. Right? Trump repeals the executive order and doesn't replace it such that the reporting requirements in the executive order go away. Or even worse, if he also tries to take down NIST and AISI in particular, alongside it, which will destroy the voluntary commitments. If these things go away and we don't have SME 1047 to replace them and we don't quickly pass a link to replace them, we are in a horrendously worse position. If we can retain the executive order, if we can retain NIST, Trump can simply revise the safety frameworks so that they don't have the woke stuff that he hates, but they still do the things that actually pertain to excess of safety. We can reach a compromise. I think that's an entirely viable world. I think there's nothing stopping the new administration from understanding that the things that we care about together, we still care about. Getting rid of the things that they hate, that they legitimately hate, that I'm not going to argue with you that you hate them because I'm not going to convince you. And also, frankly, these previous regulations did, in fact, go everything they goal. Way too far. It's impossible to implement requirements. Like, you can't use an implicit bias. You can't use a disparate impact test on an AI algorithm. It will never pass. Right? Even if you are fully committed to the DEI philosophy, it is simply impossible. You cannot do it. You cannot do it. So these rules were not going to ever work if they were trying to make them enforceable. But That is the future by default if we have nothing to work with, right? So I would implore the Trump administration the same way that I implore California to get its act together and give us something that fills that void, right? Like if you don't issue guidelines that people can follow that don't include the things you don't like, they simply can adopt the guidelines you hate, including the guidelines that you already issued by the Biden administration. You need to replace them with something else. You need to be in this game to win it. And also we have the national security apparatus and whether or not they will take action against certain actions to stop people from doing the things that they really shouldn't be doing. And also oppose visibility requirements that way. So there's various different ways this can go. But yeah, the answer is I think SB 1047 really mattered and it was killed. And this is really bad. And I think that if SB 1047 part due, With a remarkably similar bill, but modifying to around the edges or whatever, gets passed next year because with Trump in the White House, it's a completely different political environment. And AI is a year older. And you have to do what it got repealed and everybody understands that it has to be done. Maybe. But also, they can listen to some. Also, to call for youth-based regulation of AI. The worst possible thing you can do. If California goes that way, so goes the nation, so goes the world. The loss in SB 1047 is somewhere between substantial safety loss and the death of the AI industry. Like, I didn't want to kill humans in real time because I would have been considered to be hyperbolic and crazy. But I think it was a very real scenario that's shaping up way worse than I fear, where a series of states passes these terrible builds and the AI industry is either crippled or forced to concentrate amongst a few big players. And, like, SB 1047 would have stopped that from happening.
Nathan: And the mechanism there is basically just that the burdens of demonstrating neutrality and whatever on all these different dimensions are just impossible for anybody but a big tech company to comply with.
Zvi: Imagine a NEPA-style regime where anybody can see you over any of a wide variety of parts. By standards that are technically impossible for AI to complete from one of several different states with several different regulations. And you have to verify this every time you pay any change to your AI or something like that. Like imagine a complete nightmare. This is not that unlikely to happen. It's probably double digit for this kind of scenario to happen.
Nathan: If you're like a real doomer, do you want that? I mean, not you Z personally, but there's some argument for sabotage of the industry there, right? I don't want that myself.
Zvi: I have a virtue ethicist and a functionalist and theorist. So even if I thought that was good for the world, I would not, because of this reason, I would not want it. But like, no, it's just that. If we do that, the people who say we can't move to China are like kind of warmongering and jungleistic, dangerous people quite often. But if we cripple our AI industry, We deny ourselves the benefits. We hurt America quite a lot. That's terrible. It's exactly what, let's turn and catch up. Exactly what makes them into the threat that I don't think they are right now, particularly. That's how we lose. That's how we get this nightmare scenario. That's terrible. No, this is, I don't want that. I don't think that this ends well. Like, we still gotta win.
Nathan: I want to come back to virtue ethics in a second and get your take on virtue ethics in the time of Trump. Because that's something I've been thinking about quite a bit in the last week or so. Before... That though, a couple more just roundup items. You mentioned OpenAI drama. Yeah, of course, there's always more. I think we just did an episode actually with Dean and Daniel where they talked about their transparency proposals. And in that, we got a kind of recounting of Daniel's personal story of leaving and not signing the thing and then posting online about it and that gradually coming to light. And then I guess those policies mostly are reversed. He did say that, yeah, now he can basically, of course, he's still bound by not disclosing trade secrets, but he is permitted to criticize the company and his equity is restored. And so that turned out reasonably well, I guess. They have also taken another $6 billion or so in funding with a notable clause attached that they must turn the thing into a for-profit or else the investors can get their money back. Presumably all that money is going to be spent at that time on GPUs and lots of electricity. It doesn't seem like there's any way that money gets kicked back. It seems like basically the train is now rolling on this conversion to a normal-ish for-profit entity.
Zvi: $6 billion is not that much money for a company that's worth $150 billion. It's a very small raise, probably because they would have had to take a much worse price to get a much bigger raise. But yeah, they're throwing their hat over the wall. They're saying we have to convert. If we don't convert, then... We're in a lot of trouble because who's going to give us the money? Payback the earlier investors at any reasonable price. They're at risk being sold off for parts of aqua hire or like other various disaster scenarios in two years. They can't make this through. Look, they're trying to pull off the theft of at least the millennium and quite possibly all of human history. They're trying to steal most of OpenAI from the nonprofit, which currently has most of the economic value of OpenAI plus the control premium. And they're trying to do it for a small fraction of OpenAI stuff. And the question is, will they be allowed to pull this off in broad daylight or not? And time will tell. I read articles about this and my position has not changed. It seems like they're going to get away with it. What do you think? I think that the president is listening a lot to Elon Musk and Elon Musk came to OpenAI. And it is... Again, the biggest theft I've ever seen. Short of like, you know, the book of cash cuts, for example, I could argue with you a bigger theft depending on how your perspective it is, but where they could just be good business. Again, you're a little from comedy. But like the point being that like, this is a giant illegal misappropriation of funds from my perspective. Just very flat out in broad daylight. There are various actors who can do something about it. Many of them have reason to dislike OpenAI or distrust OpenAI in this situation. And also the board has to sign off on it. And the board's members could have legitimate worries about the fiduciary duties and potentially the future of the corporate bail if they sign off on a number like 25%. So I don't think it is at all obvious. They'll be allowed to get away with this at anything like the current number. But yeah, like by default, when this kind of thing happens, like the way it works is power just forces you into a corner where they threaten to blow everything up unless you give in and then people one by one give in. through some combination of threats and drives and leverage and mind fuckery. So probably they get away with not necessarily exactly the deal they're proposing, but they get away with quite a lot. That's what I'm expecting to happen in my gut. But I put a contract on the table and I might buy some, right? If you give me a cheap enough price.
Nathan: The point about Musk being in the president's ear and the president potentially having a proclivity to use the Justice Department in, let's say, idiosyncratic ways, definitely does change the analysis quite a bit. I would say in the absence of that, it would have felt overwhelmingly likely to me in that there is a lightning bolt from a distance possibility that could change it.
Zvi: I mean, you also look at it as Microsoft is attempting to steal quite a lot of money from a non-profit. and Republicans before the Congress, and they might not like that. Like, big tech is not popular. Microsoft is part of big tech, very centrally part of big tech.
Nathan: I just think they'll talk them into the whole China. I mean, right now it seems like, I guess a couple things. One is, I think there's always a creative argument and there's, I think there is an argument that, look, it's not worth that much as a nonprofit because if nobody invests the next however many billion to do the next round of scale, then we lose and you're just out of the game. You can't. really compete in this way. So you're entitled to some, but it goes to zero if this deal doesn't go through in a very literal sense. Although I think they could argue in somewhat good faith that basically this is naturally occurring dynamite, right? Like, I think the sort of Altman narrative, which I think is reasonably credible, is we didn't know when we started this thing that it was going to take $50 billion to get there. It turns out it does.
Zvi: I get all that. Right. Look, you still have to give peer value for the assets. And that includes the control premium. There is no world in which it's a fair conversation. You can argue, I think for 49%. I think that's like the fair, aggressive, but not obviously blatantly unfair of they have to lose control. Like they have to lose control or this company is not viable. So we have to give the 51% to other people so that the company is no longer under their control. So they have 49 and it'll be further deleted by investments going forward and you will get progressively less than months or something like that. But again, I just, I don't see any attempt to wrestle with the reality of the situation. I don't see any attempts to say, this is what OpenAI is worth. This is the distribution of future profits. This is the control premium. This is why this is a fair evaluation. And if you want to make the argument, it's a fair evaluation because if you don't give it to me, your company will collapse and therefore you shouldn't get very much. All right, that's your case. Let's have that discussion. Is that legal? I'm inclined to say no.
Nathan: Yeah, the other thing that's pretty interesting about it is the intangible assets goodwill argument. I don't know what the law is governing this sort of stuff. I think it's probably not very well developed because we're only recently moving into this world where so much of corporate value is the intangible goodwill type of thing. You know, this ain't the railroad era anymore, obviously. So I don't think we've really figured that out as a society, but the fact that they were all going to kind of shift over to Microsoft or that, you know, it's still not that many people, right? Like they could reconstitute themselves in a way where it's like, Unless you're going to totally change, non-compete, or sort of create new laws to prevent us from doing this, we could always just move across the street, bring our know-how with us, and take a bit of a hit. But, like, they were willing to do it once, you know, and it seems like that... You can try.
Zvi: You can use those arguments. But I think we know pretty well that if they had pulled the trigger on that, it would not have involved the vast majority of the move to Microsoft. It would have involved a large size of the move to Microsoft. Probably? but also a huge diaspora and a huge loss to OpenAI. And Microsoft, in particular, Microsoft did not want that scenario by all reports. Microsoft was doing that because it was less bad than a full diaspora, right? And it was a credible threat, right? They did strategically because they had to, but they were thrilled to just reconstitute it despite the board paying control. So I don't know that the credible threat in this context or how credible a threat it is. It's a credible backup plan. But I... It's a lot of hassle to go through, for sure. In the line of Madeline against Detroit, you can do that. The bottom line is that we have room to cut the non-profit in for a quaking lot of the future profit flows in OpenAI while converting it to a B Corp. That's what they want to do. And if they manage to browbeat the non-profit for its four into accepting nationally less than their share is worth, then that is a theft. As far as I can say, flat out. And perhaps people get away at thefts. I'll tell you, Jordan Valley does this all the time. They just decide that someone doesn't deserve what they've been legally entitled to. And they just take it. Not that weird. So yeah, I expect them to pull off some form of it. And I would do various attorneys general and other authorities over them to keep a very close eye and not let them do anything that's not allowed.
Nathan: The ARC AGI prize. I don't know if you have any particular thoughts on this. People are generally familiar. Progress, I would say was pretty good. We went from like 30 ish percent to 60 ish percent as the state of the art. 85% was where you were going to win the million bucks. So nobody's won the million bucks, but you know, the gap has been closed by roughly half.
Zvi: And I mean, everyone should assume that million bucks goes away in 20 next time around. Right. Like. This was probably unlikely to last another year.
Nathan: Yeah, it seems like can't last that much longer. Test time training emerged as an interesting theme. They had some pretty tight compute requirements and a totally unseen test set. So people started doing small model fine tuning on the unseen test set to try to learn as much as possible in that new environment. And that seemed to be the big thing. I feel like that is likely to be kind of a trend that gets exported elsewhere.
Zvi: I found in discussing this with people who did the prize, and it's not eligible to win the prize. It requires things the price doesn't allow. But yeah, it showed in principle you can do a lot better on these types of tests with training time compute. That makes sense because these tests, these test questions are all kind of part of the same general logic that is deliberately different from the general logic that they're using to train. So it's a very special case where a little bit of test time inference is going to be worth a lot. So it makes sense to me it paid off pretty well. Yeah, it's an exciting potential alternative to 01's infrastructure for learning an inference time. And we don't really know how effective it's going to be in general. I think there's kind of a best case scenario place to start. So try it on some other stops. Yeah, let's see how it goes. I'm excited. But it makes sense that if you're doing a series of related questions, then you should do some form of temporary tuning of your model while you do that. There are various things in this general sense that fall under the things I would definitely be trying if I was, like, directing research at a major lab or otherwise had a lot of money to try. A budget programming time and you could try and figure stuff out, improving parts of models, like Two things that maybe didn't happen.
Nathan: One that I would say pretty safely didn't happen. Deep fake election chaos. Everybody was on the lookout for that, of course. It didn't really happen. Did it not happen because people were just too savvy or the technology wasn't quite there or nobody really tried? What's your read? It was hard in a little. There were a few incidents.
Zvi: There was like this fake accusation against Tim Walz. There was like a small, it was like a fake, a deep fake of Kamala that was shared by Amos. There are a small number of them, but I think it's a combination of too tech savvy and not tech savvy enough in various different ways. Like people were just, I think the main impact when people were suspected, suspected real things of being fake, not when they suspected fake things of being real. And that's a common theme for now. Mm-hmm. My basic conclusion is the fake news is demand side, not supply side. People of all political orientations demand fake news. They demand echo chambers. They demand stories that aren't true and conspiracies and tall tales and so on. So in a world in which they're already making up complete and utter nonsense with very, basically no evidence in telling these stories, and they were this time around, like everyone saw it. And you don't want to know what would have happened if this election had been, like, turned out differently, right? Been closer, gone in either direction. But we were very fortunate, conditional on it, whatever you think otherwise happened, conditional on a Trump win. We were very fortunate that he won popular vote and enough of electoral college by enough of a margin that everybody agrees he won and there's no discussion about whether he won. We're always agreeing. It's over. And, like, The nightmare scenarios for this election were not the true nightmare. The biggest nightmare scenarios were scenarios in which, like, both of them thought they won in a real way. And we didn't have that. But again, like 2020, we had the big lie, right? And it wasn't driven by deepfakes. It was driven by just flat-out likes with basically no real evidence behind it. Just, like, making some up. So the issue is it just doesn't change anything to add deepfakes. It doesn't really help you. In fact, if anything, it hurts you. Because now people can point to the deep-fake picture and with enough analysis to figure out how to demonstrate that it's wrong. As it improves, there will be more and more problems. But I just don't think it's that central to the source of the problem of misinformation. The source of the problem of people having echo chambers and feeding each other tall tales from all directions and all sides. And AI analysis of information is going to become our ally. You can determine whether or not something is accurate. And it will be very helpful. And so, like, I've been optimistic about defects for a while now. Like, it did not surprise me that there was very little impact. Obviously, it was on the extreme left end of impact, where there's, like, almost no impact. But it didn't surprise me very much. I expected this to go fine. I do expect other impacts, like, 2028 is going to be very AI-influenced in a lot of different ways, unless I'm very, very surprised. But yeah, no, the people who thought that 2024 was going to be the AI election, I was always pretty skeptical of that. I thought too soon.
Nathan: Something that I demand fake news on is the effectiveness of UBI. And another topic you've covered recently is the recently wrapped and pretty thoroughly analyzed Altman backed UBI experiment. My general sense, and I haven't been super deep into the literature, was most people were like, this doesn't seem to have worked super well. But I want you to tell me the story where it works great. And I should expect that we end up with UBI. So please do. Well, Nathan, we all want this.
Zvi: But I'm not, as you mentioned, I'm a virtue ethicist. So I'm going to give you what you want here. Unfortunately, the UBI experiment in context was basically a failure. Not like 100% failure. The money didn't just disappear with no benefits. But it was clear that wasn't the way as implemented. I do think that a future world where there is math and technological unemployment and there is near universal lead for the universal basic income is just a very different scenario. This is not studied very well. So the good news, the good news that you want, the good news is that we don't know if this wouldn't be a solution to a different world or at least much better than not doing it. But Basically, no, you can't just give people a bunch of money in the first world, some of them at random, a bunch of money, and expect them to, like, probably improve their lives by getting a bunch of money the way you would expect.
Nathan: So how would you summarize? My sense is, like, people worked less... Not so much less that they had the same money, but I saw some headline where it was like people worked enough less that they ended up with less money. That seemed very strange to me. I basically discounted that on priors.
Zvi: I don't know the exact details, but essentially at the end of the two-year period, people were generally not substantially better off. Beyond the additional consumption and decreased work during that period, like the resulting savings was mostly just exhausted without resulting events going up or acquiring permanent capital or otherwise like making their lives better, which is just not the thing that we were trying to fund, right? Like we were trying to fund them getting into a permanently better situation, right? Improving their financial situation, improving their social capital, improving their private capital. improving their skills, their educations, et cetera, et cetera. At least didn't see very much of that. There is no way that this is efficient on that type of basis, if this is that context. But again, like these things can be very context dependent. So the fact that it happened to a small number of people could be, for example, high enrollment, because if you get a bunch of money in a relatively poor community, but the people around you do not, You are effectively in a situation where that money is being taxed very aggressively by your communal bonds and obligations. And you have a very hard time to have to find why you have that money that people don't. You have no opportunity to move along with other people. It's very difficult. So I can imagine a thesis that says, we've given it to everybody in Chicago. It would have been very different than giving it to a very small portion of the people in Chicago and studying them. And we just don't know. That's, to me, a very valid goal.
Nathan: Yeah. Threshold effects can be a big deal in a lot of these things, it feels. Was there any silver lining in what you reviewed around just, like, happiness, well-being, like, time spent with family, kids going to school? Non-zero effects, but again, nothing that would betify these times. Okay, we're probably not going to get in the next four years anyway, because we know who is elected. Interested in your take on, you said you could argue which way we should update. I have found myself going into the election. I was like, this does not seem like the guy we want to be president. If there's a decent chance we're going to have AGI, he seems just too unpredictable, unstable, inconsistent, not rigorous thinker, not trustworthy to people that might need to trust him if we're going to find ourselves in sticky situations. I still think all that's true. I've been at least tempted to some optimism by the Samuel Hammons of the world who are like, look at all this, you know, super smart policy we might get out of somebody like an Elon Musk, shadow president, whatever. What was your feeling coming in? What's your feeling a week on? And then. I mean, prove me wrong kids, right?
Zvi: Prove me wrong. Like I, you want this to work. We want, everybody should want this to succeed. This administration to succeed. on AI especially, but also in general, and to make things good as much as possible. And Trump was very hybrid, right? Like, there are worlds in which Trump is a huge disaster by everyone's devaluation, potentially including Trump. There are worlds in which, like, it's an essential tail risk. It's really bad either to the country's economy or budget or democracy or literal existential catastrophe. It's all on the table with some probability. that I consider to be higher in the alternative. But I also think there's more upside than there was under potential Harris administration. In particular for AI, Harris had already shown what path she was going to go down, right? She was going to go down a very Biden-style path, very advantage to that, covered some of the basics, but the chance she would react to the size of the little events seemed low, which she tends to do as far as we can tell. There'd be a large amount of emphasis on, I don't necessarily want to say work or DEI, but like on left-leaning mundane concerns that I don't think are the right focus here, that were in fact reasonably ubiquitous amongst a lot of the things that she was in her administration, collectively her administration pushing. She was involved heavily in the effort, so I can be candid to actually, like, read them as meaningful to her personal views on this, unlike some other policies. And, like, Trump and Vance, they could obviously choose, like, some of the worst possible policies in this area. They could appeal the executive order essentially without replacing it. They can sabotage the ISI. They could actively encourage open source. They could, you know, in ways that are dangerous, not in the ways that, like, are helpful. They are really clear that there is a time and a place for open models and, like, obviously non-AI open source software. Right. All the standards. But They could also potentially go as far as to like attack or even break up some of the big tech companies, creating diasporas, creating a much more multi-polar race in ways that would be very harmful. They could slow down immigration, illegal immigration in ways that are very harmful for reckoning credit in this. You can oppose terrorists, including on GPUs in ways that are like, obviously, you've been terrible. There's a lot of big risks in this room, I think, or. In addition to the obvious standard, he might just like very much disambiguate equilibrium and like cause massive inflation or a loss of democracy. There's a lot of really bad stuff that was on his own proposal, but it wouldn't strike. Tell me that story. How does it go really well? Okay. So all of the things you really fear, the things like deporting, passing amounts to immigrants and like printing tons and tons of money. And the opposing gigantic tariffs that would wreck the global economy and et cetera, et cetera. That's all like big talk and negotiating commissions. And it doesn't really happen. The Biden executive order gets replaced by an executive order and then essentially the same executive order without all of that woke. All of that's what we call woke shit, right? Regardless of what it is. Basically strip out those priorities and put the rest mostly back. The safety standards issued by NIST and AISI are modified in similar ways, but otherwise, like he kind of continues the same general policies. He then brings down regulatory barriers, helps us build up our energy infrastructure, generally, and most importantly, is capable of changing his mind. Like it's clear that he understands the effect of two people who understand that AI is anxious to attract. is capable of recognizing super duper AI and sides of the danger. So when the time comes, he and Vince, who is actually implausibly at least somewhat connected to the communities that we care about, potentially could be dialed in very quickly to the right people, realize what's going on, understand the problem, they pivot rapidly, they look at the national security apparatus, things happen. Generally, just things get unstuck, just the end or forward. We have a unified government, at least nominally, can potentially pass things, potentially reaching AI. Yeah, it would be possible that the IA bill can be passed under a Republican administration because Democrats will be willing to support it under a Republican administration. Whereas before, the Democratic administration would have been basically, like, the Republicans would have just mono-opposed it as anything that the Democratic caucus could have drafted what it meant to vote for the Republicans to accept when they would have universally opposed it. Whatever. I didn't tell stories, right? I didn't tell stories where this works out perfectly well. Including just the story of he makes the one big decision, right? Like when the time comes and the AGI show up, he pulls the trigger on necessary outcomes in ways that Harris might uphuff. And that just overrides everything else the entire administration does. Every issue, everywhere within Harris. Other than that, I think it's perfectly plausible. So like my story you can tell about the first Trump administration, right? Is you know a bunch of stuff. Until 2020, that was different from what Clinton would have done. And the impact of all those differences were Trump changes didn't matter very much. And then 2020, you did Operation Warp Speed. As a result of Operation Warp Speed, the entire world economy in our lives were dramatically better. And this was just so much more important than everything else that happened in the Trump administration until election day, right? At least until election day and all that happened afterwards. You should otherwise, if the election had gone differently, at least, you would be, like, ecstatic that Trump won over Clinton. Even if you thought there was other policies with them. It's a very, very, very reasonable thing to that. And, like, EDI is the one issue. These people are capable of achieving their minds, and it's not an obviously part of the issue. And they can separate. The basic problem, EDI... is if Trump and the Republicans associate any action to keep AI safe or regulate AI with woke and DEF and therefore instinctively oppose all of it, slash they buy the open source, open model arguments such that they just sort of get this weird thing in their heads that nothing open can ever be dangerous or bad, like this Andreessen philosophy, the combination of these two factors in some form is the nightmare AI-specific scenario. Well, the other idea of this scenario is that the prediction market is saying that a Taiwan invasion is potentially more likely now under Trump plan. And obviously, invasion of Taiwan is like a giant wild card of the entire problem. But the specific problems in the specific area, they might not come to pass. So, you know, have both. And I don't know, from what I have seen, I try not to closely follow politics. I try not to pay too much attention. The reason I didn't worry about any of this is like as much as I've talked about politics in a while now, and I wouldn't talk to the election to do it. But the reaction to the election, I guess the right word for it would almost be normal. So far, it's been like a week. But it seems like a normal transition with a normal set of picks for running the administration, most of which are out of the Senate and House. It seems like everyone is just doing what we do. And if that's true, then there's not a part of the mission, right? And you can be happy. You can be very, very sad about Republicans having power. And that has nothing to do with the AI situation, certainly. The AI situation is its own animal. And again, they're going to make some decisions. We're going to find out pretty fast.
Nathan: There's a couple of kind of keywords there that I thought. would be worth maybe digging in on a little bit. And, you know, it's not a partisan issue. That was one phrase that I have been thinking about quite a bit. The other one being, you know, it would be bad if we got into a situation where they, you know, Trump, Musk, Vance, whoever, see anything related to AI safety as sort of woke shit. Those points seem... Musk specifically is not going to be like that. It's going to be... Yeah, he's smart enough to avoid that collapse, I would think. Of course, he, you know, how long he'll be involved, at what point their relationship may have a breaking point, obviously could be any minute now. I'm one generally to think like long may he remain influential, Musk that is, but there could be a twist or a turn any moment, it seems. I want to get your take on what... A person like me, and for the purposes of this discussion, I'm like definitely pro-technology, pro-progress, broadly libertarian, left leaning on like a lot of issues that are more salient to most of the electorate, but very focused on AI. What is virtuous for me to do? How should I approach all of this strategically? I used to say that AI safety and AI ethics people should make much more common cause because it's all about getting the AIs under control and like getting good outcomes from them. Now I have to maybe reevaluate that if I'm thinking, geez, I don't want to show up at the White House with Timnit Gebru, if I'm saying her name correctly, because they're not going to like that. And so do I need to like reconsider my AI safety, AI ethics, you know, kumbaya attitude? How much should I be? I definitely am like... pro more people coming to America. I'm pro even like people that are here technically against the law. I feel like for a long time, we've sort of winked and said it was okay and it feels like for the government now to like suddenly come down very harshly and say well you're illegal you know that to me feels like not a virtuous thing for the government to be doing i am really struggling to think like where do i draw the line like deportations happen all the time if they like tick up a bit do i like go out into the streets probably not but if police are coming door to door in my neighborhood looking for people then there's got to be some line And I'm a little bit lost as to how to think about how much I should focus, how much I should compartmentalize. Should I be making like pre-commitments now? I realize I'm going to have to take some pitches, but I don't have a framework yet to organize what pitches I take, which ones are sort of not my lane, but I still need to go out of my lane to address. And also like who I should kick out of my lane if there are people in my lane that no longer have a place. Right.
Zvi: I think this is a bunch of disrequire. So one of the questions in classic, if the bad government comes, right? Call it fascism, call it mass deportation, call it whatever you want to call it. I just, your conscience cannot abide. It's just so bad. At what point do you defy the law? At what point do you work against the law? At what point do you take various levels of action against the law? And to be clear, I think there's not that high a chance that, like, you face these questions in earnest. I don't really expect the actions that you're worried about. To be that likely to take place on the scale that would, you know, cause you to seriously consider acting. But yes, I think that thinking now about how you would act in response to different actions that you strongly oppose is wise because you mean the moment would potentially think poorly or in either direction. If I'm writing something stupid, you might decide to be afraid. They're both on the table. But I'm not going to get into like exactly where to draw those lines or how to think about those policies. I'm going to try to think of questions like, what are the consequences of various policies? Because I will stay in my policies that I've decided on. And, yeah, obviously, mass deportations would be severely destructive in the economy. So it would be bad for America's productivity and competitiveness, regardless of what else you may think about in a political perspective or whatever. So, like, I'm hoping they don't happen. I'm hoping they're going to happen. Next, and specifically in how he's killed the immigration gets like massive versus fact it, then that's very bad for AI industry. I've had this as well, distinctly from that portion of like the AI addicts people and how to deal with that. So I think that like, we're now going into my other thing, the series of regulatory battles in the States where there's going to be bad legislation that the AI ethics people are going to support and the AI don't kill everyone. People are doing a pup mostly because like, oh, you know, I'm talking about people who think that AI is so doobie, but they have different ethics than I do. And they're like, well, you know, you don't struggle on AI. I don't care. But I think that's wrong. I do not do that. Some people, of course, will think that way, but I think the vast majority will not. But the reason to be allied with the AI ethics people is if we have common interest in passing You get combined bills or narrow bills that are good to both of us. They drive home the points that cause things to happen that are good, that cause better outcomes. If they're pushing for that negative stuff, there's not really much to talk about anymore at that point. And we should continue to point out that things that we want advance their costs. They should be supporting what we want. But we shouldn't support what they want when we don't think it's good in the hopes that they will support us because they don't look back Like, we need to understand that there is nobody we're negotiating with here, right? And if there would be, there was doubt in the back anyway. We don't have any leverage on this, right? Now that the SB 1047 window is closed, we don't get to necessarily have that much impact on what they do. They're going to be who they are, and they're going to gain strength.
They're going to massively gain strength because these issues will become worse killing. And... But also they will be, you know, strongly, strongly opposed by the current. So yeah, if you're trying to lobby Congress, you should be strongly, strongly supporting strong ethics calculations. That'd be a mistake. And in general, the entire Bay Not Tell Everyone movement has a problem of culturally, so much of it thinks that like blue values are obviously just standard and correct on most of the standard issues and that the orange man is horrible and bad. And the red values are terrible and obviously wrong and is in a terrible position to cooperate with the red administration, let alone a Trump, literal Trump administration. And that is creating only a few Sam Hammonds of the world that can actually treat that situation and usefully contribute. And that's a shame. And it's too bad. But I'm not saying what I was reading for an election night. People can guess. But we live in the world we live in and we should deal with what we have.
Nathan: Yeah. I'm playing around with these like phrases or mantras. And one that I've been toying with is it's better to be a live player than to play Cassandra. And that's even like within the AI space. I feel like we are headed for a sort of general accelerationist vibe and Of course, even that could change. As we've said, Trump is high variance. He could get freaked out. Who knows? But it seems like we're more than likely headed for more like an AI Manhattan Project than a pause. No, no, the Manhattan Project is Doge. Right. So the AI Manhattan Project, a subsidiary of the Doge. But... I like kind of reflexively recoil at the notion of an AI Manhattan project. I'm like, didn't, you know, is there something we should have been taking away from the actual original Manhattan project that would cause us to not want to rush into an AI Manhattan project? I mean, better security for sure. I just the sense that I think a lot of them regretted what they had done and maybe a little bit more Thinking ahead would be a good thing. And just rushing to copy this dynamic of we got to get there before China just seems really bad. But if it's going to happen, how do I calibrate where I want to focus my arguments, where I want to try to have an influence? It seems like we're headed for AI Manhattan Project. I probably can't stop that train. Maybe I retreat to some sort of China dovishness, or can we at least make the AI Manhattan Project? Something positive? Can we like race for how many cancers we can cure or some sort of other metric besides strategic dominance? I don't know. Do you have a thought? Can you coach me?
Zvi: This is a decision theory problem, right? So if you don't impact it any way, if you are not correlated, if you're actually in your decisions and your perspectives, you're not correlated in any way with the decision to do the Manhattan Project, then once it already exists, You have to decide whether you would prefer to live in a world with a more advanced and more successful Manhattan Project or not. And I would conclude that conditional on the Manhattan Project coming into existence, I would expect your and my contribution to that project to be net positive because we could put it in a safety-oriented direction on the margin. And also, like, what you are, in fact, what you're raised no matter what to the finish line, it would be better if you won. in the words, also if you want it safely as possible. And concretely, it seems wrong enough to join the project. In general, it's a project, right? As Leopold calls it, the project does happen and there's no stopping the project. Then like just based on consequentialism from there, like I think you have the bus. However, there is the problem that once you, like the decision that's being made now as to how to proceed partly depends on how people would react to it. The net project was done. And you shouldn't necessarily roll over when these things happen for that reason, because that leads to everyone rolling over, which leads to that the wrong plan, the bad guy, et cetera, carrying the day in the races happen, right? Like you don't want to let them like be like, well, we're raising now saying you better help us win. But they wouldn't have started the race if they didn't thought you would help. That's a disaster. So you have to consider all of these possibilities. Assuming you think the Manhattan Project is a mistake. Obviously, the Manhattan Project is the right decision to support it. And it's not the worst option, right? A unified in the ninth project by the United States, style of first, the project seems clearly superior to me to three to five AI companies racing to the finish line as fast as possible. And that's kind of their fault. You should only consider not helping the project in that sense. If you think that either your participation actually makes the situation worse because you would be accelerating the project without helping it be safe. In a way, it would be bad. Or you think that the project is something you shouldn't be encouraging by your future willingness to help out in it, but that requires it to be an error versus the alternatives that were on the table. And if the alternatives in the table were worse, it's not at stake. Like the project is not that unhelpful. Right? Like it's not the best scenario, but it's far from what works. And the worst is kind of the default. So, yikes.
Nathan: So it sounds like you're net favorable or tentatively favorable toward a nationalize the labs, have one. We know we're not going to have a huge number of live players for economic reasons as previously discussed. So given that there's a small number, corral them all into one, I mean, I'm choosing my words carefully because of the way that people have run the discourse these days and because it's generally confusing and conditional and nothing is obvious and we don't know what our scenario is going to be or what our options are going to be.
Zvi: But if we are clearly rapidly progressing towards AGI and ASI and the situation is essentially a race condition between multiple companies that are under pressure to proceed faster and is reasonable under the conditions. then a unified project seems better than a non-unified project. And there are various ways to get a unified project. Like the emerging assist clause, for example, is another way to get there, right? The government being involved is not a first resolution almost ever in this situation. But it's not the worst case scenario either. So it comes down to just, do you have a better option? The other thing is similar to innovative development. Once the project is inevitable, waiting longer to start it is potentially just worse, faster. So you only want to oppose starting the project if you can actually stop the project from happening in at least some world after that. So it's all very complicated, but I noticed that we're going to know more before this happens. We're not going to start the project at 24. We're not going to start in 25. By 26, we'll know a lot more, be able to make better decisions, hopefully. I mean, certain administrations, so if you're either, any number of positions could happen. We don't really know. And it's not like we can't influence, unfortunately. We should try. But, yeah, like, we should deal with the situation that it is with the officers that are still left on the table. And one way to think about the project, to me, is we should make decisions, keeping in mind that we can start, we potentially could start a project, and we don't want to, like, make it impossible to start a project if we need to start a project. But that, like, we would prefer to live in and steer towards worlds that don't require the product, right, where the product is necessary. And then that raises the question of, well, what if the product's going to happen in worlds where it wasn't necessary? What do we do about that? But, you know, again, we could talk forever about these hypothetical scenarios, but my guess is the majority of the time that the product actually happens, it was an improvement over the second alternative on the table that people in the room were talking about. Yeah. Interesting.
Nathan: Yeah. All right, last question. Where do you look for inspiration when it comes to virtue? You know, the challenge, of course, with virtue ethics is it's kind of tough to know what's virtuous, right? It's subject to debate. Maybe few shot examples is the way to inform our... behavior. So interested in what sources of inspiration for virtue you personally look to and if there's any changes or what you're contemplating on the margin to change or evolve your own work or approach in light of new circumstances.
Zvi: I just asked on Twitter about, like, just in general, who is the person that you're trying to model after in some important sense? And my brain just shot back the answers we should find on. to the extent there isn't such a person. So it embodies more than other people, I guess, who I'd want to be in these spots, but with the people I can think of. But the truth is I don't really have a single role model per se here. I'm just trying to work my own way through these problems in a very unique mishmash of different influences and perspectives. I've got my Douglas Adams and my Robert Anton Wilson and my Aristotle and my everything else. I can just put it all together and you figure it out. And like, I don't think it goes with your approach and then I'm like, that is the virtue set that I want, including myself. Right. It's always a good way. But I don't think I'm ritual effort depends on precision in the four exams, but I guess part of the way it works.
Nathan: So surely you're joking. Mr. Feynman should be required reading. I haven't actually never read that book. Maybe it should.
Zvi: It's required reading this way. I cannot really imagine an information diet. Where placing the thing you are marginally going to consume during this refinement is a mistake. It just seems like very hard for that.
Nathan: Okay. Well, I'll put that on my list. Any strategic updates for yourself? Have you found yourself thinking about any changes in the very recent past?
Zvi: I'd say I'm definitely happy with the direction I'm taking in a broad sense. I pivoted into doing more coding. for now, primarily towards accelerating. I'm just doing self-improvement, recursive self-improvement. I'm just making my writing process faster. So far, I'm well in the hole in terms of time spent versus time saved. Me too. We should compare notes on that because I'm in the same stuff. But I am considerably, I am like noticeably more efficient going forward than I was before. I expect to win in the long run because it also gives me experience I can draw upon in various ways. It should be good. But beyond that, there's a I think that I'm feeling like I did that I could have improved recently. It's something like, from what I can tell, the Trump transition is kind of a scramble in many places. And I should have considered the posthum body was enough of a scramble clusterfuck that I could actually, like, by preparation, have influence on decisions made. But like, my brain didn't consider that possibility until after I started getting like reports from people that they were in contact with the administration and sharing stuff and like, you know, in various ways. But if I had to do it over again, I would have gotten some stuff ready that other people could like at least pass along that was in better form, like prioritization. But also like I appreciated the break, like for mental health of everyone's focused on the election. So like we had a relatively quiet two weeks. It's nice. In its own way, aside from the election part. I hate it, but yeah, it's like no matter what you want to happen, whether or not you get what you want, like the election process always sucks, right?
Nathan: it's interesting one of the other things i've been wrestling with myself is like how much to think about trying to influence central decision making versus maybe just going off in a different direction and trying to do something good like it does seem that there's going to be fewer no's you know in society broadly and there's potentially going to be more vacuums if for example education department is removed or whatever or like mandates are dramatically curtailed there might just be a lot more like freedom of action in spaces that people like me have shied away from because we just don't want to deal with the red tape and having to like work through the sort of friction that was there yeah the good scenario right like the good scenario involves maybe Doge is great maybe they actually do cut a bunch of red tape and a bunch of regulations and do things again.
Zvi: Like, the Trump that I've always dreamed of getting is the real estate developer, right? Like, happy, he'd be the ultimate Yankee president. Like, he'd build baby Bill. Like, potentially with my name on it, potentially getting a cut. I don't care. Just build baby Bill. And we didn't see that in his first term. Maybe you will get fucked. Maybe, maybe his true nature will win out somehow, even though he didn't campaign on it in any way. Because, like, now that he doesn't have to win for re-election, he can, you know, but his true nature win and have fun and They shouldn't make money, but yeah, like it's a different timing, right? I think you should definitely keep an eye out for opportunity going forward. It's just more of a narrow window, especially where like really early, the changes haven't been made. You can't seize your opportunities yet necessarily because you don't know which ones are going to be available. It's hard to prepare for them the way you can influence what happens. in that window, potentially, I don't know. And I introspect, like the mistake I always have made is not trying for the high leverage move because it just seemed like it would never work. Like it was just like, that's impossible. Come on. That's ridiculous. But like, when you look back at the track record, no, it's not as ridiculous as it sounds and you should try it more often. That's true of a lot of things in life. You don't take it off Big Shots. Like no one does. Almost no one does.
Nathan: Yeah, if we learn nothing else from Peter Thiel, we should learn that a good long-term conspiracy slash plan, depending on how you want to position it, sometimes can really pay off.
Zvi: Yeah, I took my one big shot, right, with Falso. And it didn't really work as much, but it still had some positive side effects and got me a lot of contacts and got me a lot of knowledge. And we still have some assets that, like, who knows what could happen. And that Jones Act just might one day meet its end, you know? Look, look, Trump, you have this great opportunity. Just come out and kill this thing. Let's kill this thing. You need it. You need it for your identity score. You want to, you want to like balance that budget. That's how you do it. Anything else on your mind before we break? Actually, I want to pitch to the Republican side is the Trumpian pitch for appealing the Jones Act is I want to just get this out there so people can pass it along. is reshore. We want to, in this philosophy, produce things in America and sell them to America, right? But to do that, you have to be able to transport the goods to produce the thing and then transport the goods they were produced to the person who wants to consume them. And if you don't do this, if you have a situation where it costs more to ship things within the United States than to ship things from outside the United States, that's like an anti-tariff, right? It's like a tariff on yourself. You should objected a tariff on America, selling to America. And obviously that isn't safely terrible. So we should remove that. And yes, technically speaking, we are now going to buy ships from other people. We're buying ships in the name of producing stuff in America. And we're not repointing American production because America doesn't produce the ships. We're supplementing it. And then we have more ships, which makes us greedy. So we should be a slam dunk pitch. It should be intuitive. It should be obvious. Remember all that you need to love it. You should hate it. Remember if it, like all the blue states, all these people, you know what to do. Just follow through and then do the Dredge Act. Matter of fact, Passengers Act. I'll be happy.
Nathan: All right. I'm sure those influential administration members who are still with us almost three hours in will be acting quickly. Any... I'm kind of hoping for a forwarding, but yes. Any other closing thoughts before we break or calls to action?
Zvi: I mean, yeah, I think it's three hours in and I've covered a lot of things, but decide what you think is the most valuable thing to do and do that, right? And I'm not going to try and tell you what it is.
Nathan: The rest is left as an exercise for the reader. Zvi Mowshowitz, thank you again for being part of the Cognitive Revolution.