a16z on AI Voices: Call Centers, Coaches, and Companions with Olivia Moore & Anish Acharya

a16z on AI Voices: Call Centers, Coaches, and Companions with Olivia Moore & Anish Acharya

In this episode of The Cognitive Revolution, host Nathan Labenz speaks with Andreessen Horowitz partners Olivia Moore and Anish Acharya about the rapid evolution of voice AI technology and its real-world applications.


Watch Episode Here


Read Episode Description

In this episode of The Cognitive Revolution, host Nathan Labenz speaks with Andreessen Horowitz partners Olivia Moore and Anish Acharya about the rapid evolution of voice AI technology and its real-world applications. The conversation explores how multimodal models, reduced latency, and improved emotional intelligence are enabling more natural voice interactions across various platforms including Hume AI's Octave model, Google's NotebookLM, and Sesame. Nathan and his guests discuss compelling business use cases—from Happy Robot handling complex negotiations with truckers to SMBs deploying voice AI for after-hours support—while also addressing philosophical questions about labor displacement and the urgent need for responsible innovation to protect consumers from potential AI voice scams.

Please note that the content here is for informational purposes only; should not be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. For more details please see https://a16z.com/disclosures/.


SPONSORS:
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers. OCI powers industry leaders like Vodafone and Thomson Reuters with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before March 31, 2024 at https://oracle.com/cognitive

Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive

NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive

PRODUCED BY:
https://aipodcast.ing

CHAPTERS:
(00:00) About the Episode
(03:39) Introduction and Welcome
(03:50) AI Scouting Methods
(08:25) Best Voice AI Experiences
(11:34) Voice AI for Seniors
(14:27) Sponsors: Oracle Cloud Infrastructure (OCI) | Shopify
(16:54) Voice Technology Challenges
(20:48) Human-Like Conversation Dynamics
(24:13) AI Voice Negotiations
(27:34) Apple's Siri Delays
(31:13) Voice AI Stack Evolution (Part 1)
(33:16) Sponsors: NetSuite
(34:49) Voice AI Stack Evolution (Part 2)
(37:57) Context Assembly Challenges
(40:48) Enterprise Voice Applications
(46:30) Labor Market Impact
(49:36) SMB Voice Solutions
(50:53) Creator Voice Tools
(52:17) AI for Children
(56:18) AI Companionship and Romance
(58:58) Ethical Guidelines Discussion
(01:02:46) Future of Voice Computing
(01:04:49) Outro

SOCIAL LINKS:
Website: https://www.cognitiverevolutio...
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathan...
Youtube: https://youtube.com/@Cognitive...
Apple: https://podcasts.apple.com/de/...
Spotify: https://open.spotify.com/show/...


TRANSCRIPT:

Nathan Labenz: Olivia Moore and Anish Acharya from a16z here to talk about the future of AI voice interactions. Welcome to The Cognitive Revolution.

Anish Acharya: Awesome. Thanks for having us.

Nathan Labenz: I'm excited. So, first thing I wanted to say was just you guys are part of a group that I affectionately refer to as AI scouts, people who are out there on the edges of what exists and exploring it in a lot of different ways. So, before we get into the actual object level stuff with what you found and what you think is coming in the realm of voice, I'd love to just get maybe meta lessons or specific alpha tips for how you do such a good job of this. Where do you go for information? What's the top of funnel for you? How do you know you're onto something? I think people could learn a lot from your example in that respect.

Olivia Moore: Awesome. In many ways, it's our job to be chronically online and track every new thing that happens, especially as consumer investors. It's something that we've tried to hone over the last few years in particular. It's so interesting 'cause there's a pretty big delta, I think, between where AI scouts or AI experts, early adopters, are spending their time and where they're talking, and where normal consumers are. So, we try to be in both places. I would say Twitter, of course, is where most founders, AI founders, are announcing new companies or new models, new breakthroughs. AI newsletters have been massive, meetups. But then in terms of where real people are sharing what they do with AI, it's mostly on places like Instagram, TikTok. YouTube is a shocking one. YouTube is actually the number one mobile app and the number two website in the world, and so we actually find that for many companies in the consumer/prosumer space, if you look at traffic, by far their number one referral source from social is YouTube. There's this whole separate economy of YouTube influencers or YouTube creators who are making how-to content about using different AI tools. So, I would say we try to track all those places. In terms of what's an early signal to us, often we'll see normal people, again, usually teenage girls, to be frank, trying to manipulate ChatGPT into doing something, like to be a therapist, to be a friend, to be a coach. And once we see something like that, it's like, okay, the consumer pull is strong enough that probably there can and will be a couple standalone more focused products here.

Anish Acharya: I think the other thing we try to do a lot of is just use the products. That sounds obvious, but it's surprising how few people seem to have actually tried Operator or Deep Research, Deep Seek, O1 Pro, Crea. These are not super obscure long tail products, and consumers find their way to these products. But for all the folks that are insiders and are paid to be doing this work, still surprisingly few actually look at those. It's just a great way to build your intuition.

Nathan Labenz: Yeah. That's my number one advice always too is just get hands on. Like, you can't really go wrong with that. So, the one interesting thing there was it sounds like you're looking as much or even maybe more for demand side pull as you are at the people, 'cause people are all the time coming forward with their technologies to offer, but you're looking for people that are specifically trying to meet a need that maybe nobody's met yet and figure out what that implies.

Olivia Moore: Yeah. Consumer is so random and magical that we try to let the data tell us. You can have the most tenured pedigree team in the world building a consumer app, and if it doesn't hit, it doesn't hit, and that can be for a variety of different reasons. Maybe it's bad market timing. Maybe they got the product insight wrong with some specific feature, or the product insight right, but some specific feature wrong, and then no one completes the onboarding. So, I would say we try to let the data in terms of what people are actually using tell the story to us. And sometimes looking at the data of things like people pulling ChatGPT to these off label use cases will give us a little warning signal or a heads up that this is a behavior that's working, and so we should keep an eye out for products that are targeting this behavior.

Anish Acharya: You know, a great example of this, there's this old joke in pre-AI that every social app would collapse into being a dating app, and I think in the same way that every large language model is being tortured into being a therapist. So that's a funny thing to say at a dinner party, but it's also a leading indicator of what consumers want these models to do and some of the things that we wanna see in the future.

Nathan Labenz: Cool. Well, let's talk about voice. I would love to maybe just for starters get the tip of the top highlights. What products and experiences have you seen that are just the absolute best user experiences that are out there today? Hopefully, I've tried them, but we're about to find out.

Anish Acharya: Yeah. Well, maybe I can frame it, and then Olivia can talk about some of the specific products. I think the thing to resume all the way out is grounding ourselves in the fact that voice intermediates every human interaction and relationship largely, right? Here we are, obviously having voice intermediate our relationship and our conversation and the way that we get to know each other, so it really is the original most important form of communication, human communication. But it's just been completely unaddressable by technology because we've never had the infrastructure. So, it's very interesting because so many of the other substrates that we're applying the AI to are areas where we've had a lot of historical technology exploration, whereas voice is just a complete blank piece of paper. And that's why I think we're as excited about the product implications as we are about the distribution implications of this sort of technology surface.

Olivia Moore: Totally. I would say there's been a couple of surprising things to us in terms of where voice is working now, at least on the startup side. A lot of the startups that are getting real traction in terms of net new companies and products are actually more B2B oriented, just because there's so many businesses that are now running off of call centers or paying for one or two or three people even for small businesses to answer the phone all day. And so once you're at a point where voice models can be anywhere in the realm of human performance there, it kind of makes all the sense in the world to at least have the voice agent be doing your after-hours calls or your calls that would go to voicemail. So, my guess would be that actually a lot of people have maybe interacted with an AI voice agent and not quite known it because it's been a business calling them or it's been the receptionist when they've called to schedule an appointment or something like that. On the consumer side, it's been so far maybe a little different than we expected. I think most consumers have interacted with AI voice through something like a ChatGPT or a Grok, which are incredible voice experiences. More recently, something like a Sesame was a massive breakthrough, and that is still just a web demo early version of what's to come there. And so my guess is when we see the Sesame team is open sourcing the model. When we see models like that kind of spread and become more accessible for app builders to build on top of, we'll see maybe a corresponding explosion in consumer-focused voice-first tools.

Anish Acharya: I mean, a crazy thing that happened, and things are moving so quickly that I think it's easy to forget these things, is 1-800-ChatGPT. What was that? And I think whether it failed or succeeded, I think it pointed to an important insight, which is that maybe the first way most people in the world will actually experience AI is via voice, both as consumers and consumers consuming sort of business offerings.

Olivia Moore: Yeah.

Nathan Labenz: Yeah, I think of my dear mamaw all the time, who is now in her early 90s and lives alone and is sharp but not an early adopter of new technologies. And for her, I think it's gonna be Alexa Plus that is going to be this big transition. She already sits there and asks to play music for her. But will she engage it in conversation? How natural will she find it? I don't know, but it's clearly gonna be for her, that form factor that could unlock a whole new set of things.

Anish Acharya: And what's very funny is ironically, mamaw perhaps is now exploring new technology, but also isn't that familiar with old technology, which is maybe she's calling you to get tech support and help using her existing products and devices. I think the potential applications for voices applied to seniors are super interesting. We've discussed it a lot. And it's not just access to the new things, it's access to the old things as well that they just never developed the skills to interact with.

Nathan Labenz: Yeah. Funny enough, one of my first GPT-4 tests, going way back to the red team days, was tech support for seniors. And the prompt that I found to work really well is exactly what I say to her when this happens, when she calls me and says, "You know, my friend emailed me, and I can't find it." I always tell her, "Read everything on the screen from the top to the bottom." And she'll literally go like, "Okay, Verizon, the time," and then eventually, we get down to where the issue is. And that same thing basically worked out of the box with GPT-4. And of course, that was text only at that time, but I do see that as a huge unlock for all sorts of different things that she kind of struggles to access right now. If she can figure out the TV remote, then we'll really be in business.

Olivia Moore: Oh, exactly. Well, and I think most people don't have an infinitely patient person like you to walk them through how to do that. And so, even we saw recently, I think it was late last year in December, Google released the Gemini models that could see what was on your screen and interact with you in real time.

Nathan Labenz: We're right on the brink of models like that. OpenAI has one as well, becoming API available and becoming ready for builders to actually capitalize on. And so, once we see something like that become actually usable, it's gonna be massive.

Anish Acharya: It's also so interesting because it points at something maybe Google and other search players should have done pre-AI, which is, how do you take everything on the internet and apply it? The most important context is the context that's around me in my physical space. So, the idea of being able to point your phone at the remote, in the case of Nana, and be able to debug the problem that way, instead of trying to translate what you're seeing in the physical world line by line to either Nathan or to Google. That interaction pattern doesn't make sense.

Olivia Moore: So, one of the things I noticed in your presentation about this is that you said it's basically solved. And I'm kinda wondering what you see as remaining, if anything. The Sesame model might have even come out since that presentation, and certainly takes another step forward in terms of just overall natural sound of the voice. I find the interruption mechanic still a little bit gnarly, especially if there's a multi-way conversation. Sometimes I try to demo advanced voice mode for people just to try to bring them up to speed on this is where AI is at now, if you haven't been paying attention. But then those demos often kind of go a little bit sideways on me because the interaction mechanic is still a little weird and if I'm doing it one-on-one, it's okay. I kinda know how to use it and it seems to be optimized for that one-on-one. But in the group setting, it doesn't do super well in many cases. Anyway, that's just me kind of identifying my remaining pain points. But what do you think are the hardest things to get right, or the most important things that still need to get solved?

Nathan Labenz: Yeah. I think the things that have gotten very close to solved over the past year would be things like basic latency and understandability, which is the difference between being able to have a conversation and not have a conversation. And so in most cases, I think the models are getting those right now. Latency is less than half a second on most of the models, which feels very human-like. I think the realization that a lot of people had that have tried the Sesame demo is beyond latency. There's so many speech pattern nuances that an actual human will say, which might actually sound like an error to the model. Like, there are extra pauses, they're saying "um" or "hm," vocal inflections, and when you add those things like that in, which the Sesame team has done, it goes from being a voice that sounds better than an Alexa and Siri, but is still kind of robotic in some ways, or still clearly AI and does something that could be mistaken for a human. I think the remaining things for me, there's still a lot to do around emotionality. I talked to a lot of founders building voice agents who want the models to be able to understand what they're saying and vary the tone and the inflection based on that. So if the AI voice agent is going to say something happy or exciting, the voice should reflect that. If they're going to say something sad, the voice should be lower in tone and pitch and a little bit slower. And so that's something that we still need to solve. And then interruptibility is huge. I kind of think of it as humans have also not solved interruptibility in conversations. Like, we still have the issue where two people start talking at once and you have to be like, "No, no, you go ahead." And so we need a clever way for voice AI to be able to solve that in a way that humans maybe have not yet.

Anish Acharya: You know, I think this is why we've gotten where we need to get to for voice models to work, but I think what Sesame showed us is that conversation models may be very different from voice models, or maybe an extension of voice models. So even how are the three of us coordinating on who's going to talk next? There's so much non-verbal communication that's happening, whether it's video or even in audio, and the models have been trained really well to do speech-to-text, text-to-speech. Interestingly, less of the companies we're seeing are using native voice-to-voice-to-voice models. So that's one opportunity. And then just generally understanding, as Olivia noted, some of the nuances of conversation and having that be natively programmed in. For example, when I start talking, I'm not exactly sure what I'm gonna say. It sort of comes like that's true for all of us, whereas the AI knows exactly what it's gonna say, which can sometimes be a little bit creepy. It does not quite get out of the uncanny valley.

Olivia Moore: Yeah, there's a recent paper from Meta that you're bringing to mind where they're doing brain reading and have established now in actual signal understanding terms the phrase level formation that happens like two seconds before you actually

Nathan Labenz: Interesting.

Olivia Moore: ...token and then how...

Nathan Labenz: ...so it kinda gets down to the literal next syllable that's just a fraction of a second before you spit it out. And, yeah, there's something... It does feel like maybe some of these things might be better served in some ways by a diffusion structure, if you could kinda go from coarse to fine, as opposed to always being one token at a time at the end. But anyway, that's more speculation. You mentioned the stack, so-

Anish Acharya: Actually, Nathan, can I add one more thing?

Nathan Labenz: Plea-

Anish Acharya: The only other thing I'd add is that you actually do see varying levels of performance and sort of conversation quality, which is a very big predictor of driving business outcomes. So for example, we're investors in a company called Happy Robot, which is a voice AI for freight brokers. And if you just look at the quality of their text-to-speech and just how the conversation feels, it just feels way better than many of the off-the-shelf things. And this is because the team is more technical and has done more work under the hood. So yes, there's a bunch of other competitors who can provide a kind of fine commodity voice experience. But if you actually are a little bit more specialized in the technology, you can do something that feels more human, and that gives you permission to move into higher value conversations. You know, persuasion, negotiation, disagreement. Those are pretty nuanced conversations, and for the business to trust you with those conversations, you've got to have a voice model that not just says the right things, but says them in a way that feels compelling.

Nathan Labenz: Yeah, negotiation in particular is a lot to ask somebody to delegate to an AI. Is this company actually doing that for the freight companies, they're doing actual negotiations?

Anish Acharya: Negotiates, befriends, disagrees. We'll send you the demo. It's amazing. You should embed it. It's a real aha moment, I think, in terms of what's possible.

Olivia Moore: It's really interesting, 'cause I think to the point of the LLM always knows exactly what it could or should say, there is a version of an AI voice agent that does negotiation that would just respond to the human and say, "No, this is my best price. This is what I'm gonna offer you," and just kind of say it over and over. But if you launch that kind of experience to a user, they are gonna try to then circumvent, talk to a human agent. It's not gonna feel like an actual negotiation to them, where they've given it their best and they've gotten a concession from the other side. And so what Happy Robot has done, which is really smart, is they actually introduce extra latency by saying, "Okay," the voice agent will say, "hold on, let me go talk to my supervisor," and put them on hold for five seconds, and then they come back with a slightly better price. And of course, the voice agent knows, "Here is my actual max. This is how much I can kind of go up or down or move the price for the end customer." But they found that the acceptance rate of that kind of final offer is much higher in cases where the human feels like they've gone through an actual negotiation, because the voice agent has kind of simulated that situation that feels satisfied to them.

Nathan Labenz: Yeah, I don't know how to feel about that, to be honest. I mean, it's genius, and it's a little bit far out. I guess, staying on this for a second, does this... Do people know that they're talking to an AI when they're talking to this?

Anish Acharya: Yeah.

Nathan Labenz: It just... Does it say that upfront? Or how do they...

Anish Acharya: Yeah, it discloses it. And what's actually most surprising is people don't... And look, these are not... They're talking to truckers driving all over the countries in their big rigs, so these aren't exactly Stanford technology enthusiasts, and they don't mind at all. 'Cause I think that our reptilian brain is so trained to react to these interactions in a specific way, that once you get into the conversation, even if you intellectually know that it's an AI, you fall into the rhythm, the expectations, the patterns, the cadence of the human conversation very quickly.

Olivia Moore: Yeah. It reminds me of something actually Anish says a lot, which is, "In the best cases, AI can be more human than the humans." The Happy Robot example is a good one, where every time you call in and you reach the voice agent, you get the same person every time. They're friendly. They'll listen to you, talk about your day or what happened. They'll be very patient. They'll be sympathetic. They'll spend all the time in the world with you on the phone if you want to. And so it's actually in many cases, assuming the voice agent can answer your question, which they almost all can now, it's actually a better experience for the end consumer than if you get what can sometimes be a grumpy actual human being on the other end of the phone line.

Nathan Labenz: Yeah, superhuman patience is, turns out, not that hard to achieve, and quite valuable. And yeah, low to no wait times is also a huge driving force for value. So I definitely see it. I've been really excited about voice for a long time, even though it's probably gonna put me as a podcaster out of work before too long as well. You mentioned Siri a minute ago, and obviously they've recently made headlines in a seemingly negative way by saying they're not gonna have an update until 2027, which just feels like possibly the other side of the singularity from where I'm sitting. So, I don't know if you can make any sense of that. I guess another way to maybe come at that is, how reliable do these things need to be? I mean, I think there's often this sort of, in AI in general, there's this faux comparison or sort of imagined perfection that people compare an AI solution to. And I always try to remind people, and Ethan Moggin has a great phrase for this also, it's like, the best available or the best hireable human for the job is really the comparison that you should be making. So, is that what Apple is getting wrong here? How do you make sense of that development?

Anish Acharya: I have so many thoughts on this. I mean, for any consumer that interacts with AI products and also uses things like Siri, it's like a stick in the eye five times a day, 'cause Siri is still so bad at just the most basic. And then juxtaposed on all the advertising that Apple's doing about Apple Intelligence, it's not awesome. I think it really degrades consumer trust. And then look, I think that AI does best when it is exploring the surface of human interaction, which is a little messy, and these large incumbent corporations are designed to take the humanity out of every technology product, so there's a sort of almost irreconcilable tension between the two. And look, the more they try to neuter the AI, the more dissatisfied consumers are with it. Some of the Genmoji stuff, it's a valiant effort, but it just looks terrible. I don't know. Maybe some people think it looks good. I don't. So I just think it's gonna be a very difficult spiritual problem for these companies to resolve, because just between the committees and the lawyers and the whole posture that large incumbents have, it's gonna be hard for them to embrace the messiness of AI. I don't know. What do you think, Olivia?

Olivia Moore: I mean, I think we saw the reaction to the AI-generated texts and notification summaries on iPhones, and I would guess that kind of spooked Apple a little bit. Because for Apple to launch a new AI product, to Anish's point, it has to be production ready to land on hundreds of millions of iPhones of people of all ages, all sorts of use cases, and it needs to feel both natural and also be correct, whereas a startup has the luxury of not having to meet that bar, because the people who seek out and try a new startup product are kind of the natural early adopters and they know that it's beta, they know that it's AI, they know that it's a test product. I think what we've seen Google do well has been the new Google Labs experiments, and that's where a lot of the best, in my opinion at least, AI Google products have come out of, like Notebook LM and some of the video models like VO2, where essentially, they have taken the approach where if you're an early technology tester, you kind of sign up and get on the wait list and beta test and use these products, and then ideally, they get them production ready to launch to a slightly larger audience incrementally, but even as we've seen with Notebook LM, because it's Google, once it does make its way to the public, the pace of innovation there is a lot slower than it would be at a net new company that doesn't have tens of thousands of people to continue to employ.

Anish Acharya: I mean, it's like 10 VPs for every engineer working on Notebook LM right now. And look, another example of this is Deep Research. Deep Research actually was originally a Gemini product, and it's obviously a product that Google should be the best in the world at, and yet they never commercialized it in the right way for whatever reason, and now ChatGPT is known for their deep research capabilities. It's just one missed opportunity after another with incumbents.

Nathan Labenz: Let me circle back to that in a minute. I wanted to get a little bit deeper on the stack and the sort of balance between allowing the AI to handle things and putting your trust and confidence in its decision-making versus trying to maximize accuracy, which typically is gonna mean more control measures and more stilted or less natural interactions. The stack, for those that aren't familiar with this, basically has been the audio in, transcribe that to text, feed that into a language model, feed that response back into another text-to-speech model, and then send the speech back to the user, roughly speaking. You can complicate that if you want to, but that pipeline basically now does actually work fast enough in many cases to be viable. But then there's the fully all multimodal model, voice-to-voice all through one set of weights. When you said that builders are mostly still building on these more multipart kind of pipeline-style technology stacks, is that because they're getting better results or is it because that's the only thing that is economical for their use case right now? What's driving that, and do you see that changing?

Anish Acharya: Yeah. I think the voice-to-voice models are definitely and unsurprisingly probably a little bit more expensive right now. But also, I think they're just earlier. I talk to a lot of voice agent founders who have probably tried all of the models available, and I think probably Gemini Flash was maybe the best of the bunch if you were gonna try to do full voice-to-voice. But in general, the interruptibility is not quite there yet on those models. But as we see pretty much every week, the models are the worst that they're ever gonna be right now. And so I'm sure similar to how last May when we first released our voice report, sub-one-second of latency was hard to imagine. And I'm sure by this time next year, everyone is gonna be using a model stack that's much sleeker and is hard for us to even imagine right now.

Olivia Moore: I also think that there's... Reasoning models are a new primitive, and I think that they are maybe being underappreciated by many because they have the same interface as language models. But if you start to think about it, there are aspects of interactions that really benefit from this probabilistic nature of the outputs of language models, all the friendship, negotiation, et cetera, that you have in a business conversation even. And then there are things that really do not benefit from that where you want high degrees of accuracy, like a negotiation, like one price is definitively better than another for one of the parties and the counterparties. So in that case, you can sort of orchestrate reasoning models and language models to handle the right parts of the conversation to get the sort of desired output. So I think you have less of the issues around aspects of the conversation that demand accuracy and the models not being well-suited to that. I think the other broad philosophical point Mark has mentioned this a lot is, is the bar be as good as humans or is the bar be perfect? If the bar is to be perfect, all these technologies are not there and maybe never will be. If the bar is to outperform humans, I think we're actually there in many ways.

Nathan Labenz: Yeah. It's all happening very fast. How is the tool use? 'Cause that's one other thing that I could sort of see being a challenge. Although I could see it cutting either way, but depending on especially how idiosyncratic or esoteric your tool use context is. For example, if you're doing calls into some obscure freight management system, you might need to have potentially even the ability to fine-tune that model to get those calls to work quite right. So I guess, broadly speaking, how big of a deal is the sort of backend interaction of tool use, and what's working or not working there today from what you've seen?

Olivia Moore: It's a good question. I think what we're broadly seeing is that you've got to build a lot more product than just the voice capability. I think the voice capability alone is insufficient. And there may be more sort of room to explore voice only in areas like agents where there's different types of conversations, and there's different outputs that you want, and there's different considerations around price point and what APIs you wanna consume for what fidelity of output. But even then, there's an enormous amount of integrations and workflows that you probably have to deliver to build a traditional moat. So without commenting on a specific use case in something like freight, our general observation is that the capability gets you in the conversation but isn't sufficient to get you to the other side.

Anish Acharya: Yeah. I would agree. Especially when you think about a lot of these companies that voice agents are selling into are very much traditional enterprises. They're not the Apples, the Googles of the world. And so for them to build or launch on a more horizontal way, or even to try to build a voice agent themselves, in many ways, it's a miracle if they can do it once, let alone keep it updated as the models get better, as there's new options that are gonna be a better experience for the customer. When the integration breaks with the backend system of record, what do they do? And so I think that is why we have seen so much maybe excitement on the customer side for more vertically focused platforms that, to your point, are maybe both fine-tuned for the types of conversations that these customers are having, but also have done the work to build out the long tail of integrations and also the long tail of honestly just conversation types of how do you manage piecing together different tools for different types of tasks that you need to complete.

Nathan Labenz: Yeah. I always think about Tyler Cowen saying, "Context is that which is scarce." And that has never been more true than in the... I think he said that first before the AI moment, but the AI era, I should say. But it's never been more true. And so much of what I see standing in the way between businesses that want to use AI and actual successful use, is literally just assembling the context and sometimes getting the context out of their heads and onto some documented form that the AI can process.

Nathan Labenz: And so it's not too surprising that you would see base models, even as they are quite powerful and extremely versatile, certainly relative to anything that came before, it shouldn't be too surprising that they don't know the intricacies of not just the freight business in general, which they might even, but the way you handle your freight business. That stuff is, that last mile, it trips a lot of people up. Do you have any sort of observations or synthesis of what's going on there? I honestly still kind of struggle, and I do like some amount of, not a lot, but enough kind of hands-on consulting with businesses that I've seen this repeatedly, where it seems like people have a really hard time assembling the content. I mean, maybe this is what the verticalized startups are gonna solve for us. But do you have a sense for what's going on there? Because it seems like it should be easier or we should be making faster progress in AI adoption than I feel like we're actually seeing.

Anish Acharya: Yeah, it's funny. I think we've definitely seen an explosion of AI budgets within enterprises and within end customers. In some cases, this was especially true six months ago, it's less true now, which I think is good, but it's like they're kind of looking for things to buy, because to your point, they're not spending all day around what's the latest in AI, and so they're not exactly sure how to use it in their day-to-day. I think we even saw this with ChatGPT where launched massive splash, fastest product ever to 100 million users, but then people weren't really sure how to use it every day, and the usage was flat for basically a year. And only now in the past year is there more models and more obvious things to do with it, has the usage kind of picked back up. So I think it's exactly to that point where on both the consumer side and the business side, like if people don't know what to do with the product, and if it's more than a few steps to get up and running, there's going to be a decreasing funnel of people who make it through and are actually becoming paying customers. And so that's part of the reason that I think that we're seeing companies that are so vertically focused see the most success here.

Olivia Moore: Totally. And actually, I wouldn't underestimate the amount of growth these voice AI companies are seeing, especially on the agent side, but also on the scribe side. The businesses either don't know how to use it or do know how to use it and it's growing explosively within these businesses, because for the businesses that can embrace it, it's such a straightforward substitution for the humans that make the phone calls that there's no... And of course, they need to think then, "Okay, we're going to hold it to some sort of a CSAT score and we're going to measure outcomes on things like negotiation." So of course there's guardrails, there's integrations that need to be done. But in the cases where it's working, it's really, really working and we're seeing some of the fastest growing B2B startups we've seen in 10 years.

Nathan Labenz: Perfect. Tee up for a little lightning round on what's really working across different corners of the economy. I was going to start with enterprise, actually. So you mentioned scribes. It seems like that's a pretty well-established use case at this point. We're starting to see that bleed over into real-time coaching on calls and then obviously full on agent substitution. Is the real-time coaching working? What do we know about that? And then is the substitution actually happening to the point where you think we'll see labor market statistic effects this year? Or do you think this will continue to be sort of isolated examples that are the exception rather than the norm for a while still?

Anish Acharya: Yeah. These are great questions. I would say that coaching, it is definitely working and it is an interesting transition point in that there are some jobs, for example, a call center job where if you're an AI product that is selling a coach into call center workers, there's a massive amount of demand for that right now. But we might imagine that in, I don't know, five years, or probably less, the AI agent is going to kind of replace a lot of those workers. I think where the coaching will really continue to exist is these jobs that have a heavy in real life component or heavy personal component. We've seen quite a few AI real-time coaches for salespeople, for example, for HVAC technicians. Many of these jobs, like, whether or not you get the $10,000 upsell comes down to the nuance of what you say or the question that you ask. So even if you're paying hundreds of dollars a month as an individual user for an AI coach there, it's absolutely worth it. In terms of the economic impact of the voice agents, I think to be honest, there are some cases, a basic call center, for example, where an AI taking...

Nathan Labenz: ...calls will free up the human workers to do honestly much better and more rewarding jobs. These are massively high turnover, 300%...

Anish Acharya: Yeah.

Nathan Labenz: ...per year, thankless jobs in many cases. And I think that there's better things for people to be doing. And then in other cases, like recruiting is one example, we've seen quite a few voice agents that conduct initial screening calls for human recruiters, and that means that they can spend those 20 extra hours a week with the five candidates that they're really excited about, and catering to them and really convincing them to engage in the process and take the job. So in many cases, I think we see it as amplifying the humans in their current roles versus replacing them.

Anish Acharya: Yeah, I think all of that is exactly right. You know, potentially more humans sort of move up the stack to do higher value work. I think the other thing is that you could take a look at what's happening and say, "At the limit, we have 20% less jobs." Or you could say, "At the limit, we all work four days a week." And we're paid to be optimists, but I believe that there's an opportunity for people to be more specialized and do more of the work that matters and less of the administrative overhead stuff that seems to consume most of our day-to-day.

Olivia Moore: Yeah, I'm with that. I guess what I'm trying to really zero in on though is, you know, 'cause I think it's true in some ways what you're saying, but then there's other ways in which I think, you know, retraining and reskilling and everybody was gonna become a programmer and that really hasn't happened. And I think when we look at call centers specifically and the people that work there, we may free up resources and the company may grow and invest in other ways, but I think in many cases, the people that have the call center jobs are not going to be moved into other jobs at that same company. And instead, the AI is gonna just do that job, and they'll just have a much lower headcount in the call center operation. They may reinvest, there may be more R&D, there could be all sorts of great things.

Anish Acharya: Sure.

Olivia Moore: And by the way, I also think people should work less and a sort of new social contract that embraces that is high on my list of things people should be developing now. But kind of leaving aside the second order effects of what happens, do you think we are just at a point technologically where we could see, if enterprises wanted to do it, a 90% headcount reduction in call centers?

Anish Acharya: I mean, I don't think so. We'll see, right? So far, we're not seeing it because as Olivia said, nobody's job at the call center is to just do initial phone screens, right? People have recruiting jobs which involve initial phone screens that are annoying and can be overwhelming, as well as deeper interviews, as well as salary negotiations, as well as ensuring that employees are successful once they onboard. So yes, the AI is going after the initial phone screen, but we haven't seen a reduction in headcount because all of the other work is so important. And, you know, frankly, in many cases, they don't yet trust the AI to do the work or the AI can't do the work. You know, the AI can't take your employee to a baseball game a month after they've started and make sure they're having a fantastic experience. So, I totally understand the conceptual argument, it's just not something we've seen yet.

Nathan Labenz: I think we'll also see the success of AI open up probably a new type of job that we haven't imagined before for humans to do. One great example is one of the fastest growing jobs right now is basically contributing training data and doing online tasks and other things that help the AI, which might be actually similarly paid, but a much better lifestyle than a call center job in many ways. And so, it'll be interesting to see what new types of opportunities open up for humans in the AI era.

Anish Acharya: Totally.

Olivia Moore: So long as the AIs are less abusive than the human callers, I think we can safely assume that much.

Nathan Labenz: Yes, for sure.

Olivia Moore: To be clear on my perspective on this, I'm not anti-displacement or trying to stoke fear about that. I think we probably are gonna see it and should get ready for it. And ideally, it would be a good thing. I often ask people outside of the Silicon Valley bubble—and I live in Detroit, Michigan—"If you didn't have to work the job you have to make the money that you need for the rest of your life, would you still work the job?" And the overwhelming answer is no. I'm also very fortunate that I would probably continue to do what I'm doing even if I weren't getting paid for it or didn't need to get paid for it. But I think it is really important for the Silicon Valley set to keep in mind that most jobs are not jobs people are doing for the joy of the job. And if they could have their needs met in other ways, they would happily take that trade. So I'm not a job preserver, but I'm just trying to figure out at what point this wave of disruption is actually gonna hit. How much time do we have to get ready for it? And it seems like the call center thing, if I'm understanding your answer correctly, is probably at least another year before we would see a sort of...

Nathan Labenz: Of course, these things are not binary either, but I put that 90% number out there just to say order of magnitude effect, even if it doesn't go to zero humans in the call center, it sounds like you think that's at least a 2026 plus phenomenon.

Anish Acharya: Look, I think that it's cold to tell everybody, "Hey, just go learn programming." That's not what I'm saying at all. I just think it's very hard to understand what the labor impact of these technologies will be. And it's easy to hypothesize about a world in which all the jobs just go away, but it's not what we're seeing yet. So even if the technology is 18 months away, I don't know that the labor market will change in the way that we're perhaps imagining as a result of the technology. I think we'll have to see. A broad question, though, that you're speaking to is, what does it mean for our society when we have all this abundance, and is there a lack of purpose? I have a big theory that people need purpose, and if they don't have enough of it, they create it. And sometimes it gets pointed in bad directions. That's a lot of why I think Google ends up not working culturally. I mean, I think there's a lot of brilliant people there, but in a sense it's a low stakes environment from a purpose perspective because the business is almost too good, and I do get nervous about mirroring that in society. So, an almost more interesting question to me is, in a world where we actually just do have all of this abundance, how do we think about ensuring that people have purpose versus ensuring that people have jobs and income and all those other necessities?

Nathan Labenz: I'm a little more maybe even optimistic on that dimension, but certainly I file that under a good problem to have. Okay, so this is supposed to be the lightning round, so we'll go through these next ones faster. So, SMBs, you got the call answering, that seems pretty straightforward. Any highlights for SMB owners out there, where should they go to potentially get the best of the AI call answers today?

Olivia Moore: I would say something we've been very excited about for SMBs is that there are vertical solutions depending on what you're doing, even for SMBs. So if you're a restaurant operator, if you're a spa, if you're home services, we'll send our market map. We put these all in their market map that there is a solution catered to you, which is fantastic. This actually gets back to what we were talking about before, which is SMBs typically have one or two people doing nothing but answering phone calls, which is incredibly expensive for a small business. And when we talk to the SMB customers, when they switch over to a voice agent, they are not laying off that human who is usually a core part of their business, but that person is now able to spend their time doing things that are much better for the customer experience, or growing the business further, or other kind of extensions of the business, which are really powerful and exciting.

Nathan Labenz: Okay. How about creators? This one's maybe a little less interactive, although maybe you're seeing interactive experiences that are sort of creator economy. One question I had, 'cause I might actually do a episode powered by AI voice in the not-too-distant future, who has the best voice design today? If I want to create somebody that's gonna give me that sort of film noir kinda read, where should I go for that sort of thing?

Olivia Moore: I mean, I think there's different answers to this question. There's platforms like Eleven where you can clone your voice. Eleven also has a great tool where you can describe a voice, describe a sound, have it created. The other end of the creator part of AI voice to me is these digital clones, which we're seeing more and more of, of platforms like Delphi where you can essentially launch a version of yourself that your audience can interact with in your voice or maybe via text or via other modalities, which is fascinating. I haven't seen AI replacing full podcast episodes for any podcasters yet. I think hypothetically we could get there, where maybe you just prompt the questions that you would ask, but we're probably still a couple years away from that.

Anish Acharya: It's worth playing with HeyGen, captions, a bunch of these other products to fine tune a model of yourself, video and audio, and then giving it a script, because I think there is a world in which you do this entire podcast without ever owning a video camera or microphone.

Nathan Labenz: We did an episode with HeyGen actually, and for some reason Josh's audio wasn't great, and so we then redid his side of the conversation with his avatar from HeyGen. It's been models. It was pretty good. It was not quite as good as the original would have been, but it was fitting that it happened on that episode. How about for kids? I've been playing classic Nintendo games with my kids recently, and I put advanced voice mode on often playing Mario 64, the old open world game, 'cause I don't know where to go. Where's the star? What do I have to do? So I'll ask advanced voice mode, "All right, this is the level I'm playing, where do I go?" And my kids are now to the point where they're like, "Daddy, ask AI anytime." That either I'm slow to do something or don't know what to do. So Daddy Ask AI. So that's cool. I would love to have something like interactive, educational for my kids, but I'm also like, "Yikes, I don't necessarily want to trust anyone to implement AI effectively for my kids." So any winners, any early winners in that space?

Anish Acharya: I think what we've talked a lot about this as a team, and an area of exploration we're fascinated by is all the stuff around behavioral, social, emotional for kids. I think that is an area where AI is very naturally suited to deliver value, and there's just not a lot of technology there. A great example is my son loves to play Minecraft, but all of the other people he meets online are toxic teenagers. So why isn't there a companion that can play Minecraft with him and model positive social behavior? And then another example is just observing the classroom. If your child goes to one of those great schools where they've got two teachers in every classroom, one doing all the academic components, one doing all of the social/emotional components, you already benefit from this. But a lot of kids don't go to schools like that, so having a vision model, a multimodal model that can observe the children interacting and give feedback to parents and teachers. There's a ton. Of course, there's assignment generation and quiz generation and helping kids learn in whatever way is best suited to them. That stuff will happen and it'll be super important, but I think pairing it with all of the emotional opportunities is where we get most excited.

Olivia Moore: We've seen companies like Synthesis, Elllo, and Super Teacher exploring ideas like, "What if every kid had a reading tutor or math tutor sitting next to them all day, understanding how they learn best and catering to them?" On the other end of the spectrum, we've seen things like Kurio toys asking, "What if kids had a friend, mentor, or coach with them every day—someone who could track their progress, help them stay on the right path, or simply be a sympathetic listening ear?"

Anish Acharya: The companion space is fascinating. There are amazing companies like Character and others in the top 50 or 100 lists, but it feels like we're still in the early stages of exploring this area. There are so many contextual opportunities to innovate. One aspect of companionship might be completely sympathetic, while another could challenge and push you—perhaps even disagree with you. We joke about "East Coast mode," where a companion is more terse and direct. That kind of product doesn’t exist yet, but I think we'll see it in the next few years.

Nathan Labenz: That’s really interesting. In the interest of time, let’s skip over legal and medical applications for now. But since you mentioned companionship, let’s touch on the farthest edges of this space—moving beyond companionship into relationships and even NSFW use cases. I’m curious about what’s happening on the fringes of AI romance. Is that something you’re scouting or considering in investment terms?

Anish Acharya: It’s a good question. Initially, I assumed most companion use cases would cater to frisky young men, but that hasn’t been the case. A significant audience is women, and the experience feels more like interactive fiction than anything resembling pornography. There are a lot of mistaken assumptions about how these products are used and who uses them. Romance itself has many definitions. Some critique these products as substitutes for traditional relationships, but they might actually make people more capable. For example, an AI could help someone improve conversational skills or even flirting, or serve as an outlet for emotional frustrations that might otherwise strain in-person relationships. These are some of the surprising dynamics. What do you think, Olivia?

Olivia Moore: I agree. It’s funny—whenever we pull the top 50 or 100 AI apps and share them with our team, people always notice the abundance of companion platforms, many of which lean toward NSFW use cases.

Nathan Labenz: It’s interesting that, as Anish mentioned, it’s often more about the "AI boyfriend" use case than the "AI girlfriend." It feels closer to interactive fan fiction than anything else.

Anish Acharya: Sexuality is a fundamental part of the human experience, and we can’t pretend it doesn’t exist. Ignoring it leads to situations like Apple, which struggles to release certain products for years. We need to embrace this aspect of human nature while finding ways to support positive use cases. Of course, there will always be fringe products that we won’t invest in and most people won’t use, but those are the least interesting to discuss because they’ve always existed.

Olivia Moore: I’ve done two episodes with Eugenia from Replika. In the more recent one, we reviewed Stanford research showing that Replika significantly reduced suicidal ideation for users struggling with those issues. More often than not, it also encouraged people to engage with the real world. While some of the data is self-reported, users indicated that Replika wasn’t holding them back but instead motivating them to step out into life.

Anish Acharya: That’s amazing.

Olivia Moore: I’m a big fan. As with all technology—and especially AI—the specifics of what we build are critically important. There’s no doubt that someone could create a predatory romantic AI that’s addictive and exploitative. But we’re seeing proof that positive, majority-beneficial versions are possible.

This brings me to a question about rules of the road. It’s early in this space, but one proposed rule I like for its simplicity is Yuval Noah Harari’s idea that AI must disclose itself as AI. Another concept I’ve been considering is a "Do Not Clone" registry—a modern version of "Do Not Call." People could register their likeness and voice to prevent unauthorized cloning by AI platforms. Do you have thoughts on emerging best practices or regulations to keep this space positive?

Nathan Labenz: It’s definitely early, but I’ve been surprised by the current dynamics. More people seem frustrated by large model companies restricting what they can do rather than concerns about being deep-faked or cloned. For the average consumer, those issues aren’t widespread yet. Major startups and model companies have been cautious about allowing anything related to public figures or personal images.

The idea of a registry or directory is intriguing, especially as it opens opportunities for licensing identities for approved use cases. Platforms like ElevenLabs already feature iconic voice collections from celebrities and voiceover artists, creating new opportunities for those who might not have found work in Hollywood. Similarly, influencers with smaller followings could use AI avatars to extend their reach and secure deals they couldn’t otherwise. I’m excited to see how people use AI tools to amplify themselves rather than worrying about deep-fakes for everyday individuals.

Anish Acharya: I totally agree. I think every time there's a new technology, the talking heads try to get overly paternalistic, and I just don't think that's a generous enough view of the average consumer, how smart they are, how media literate they are. Of course, every technology has the potential for misuses, so I'm not being glib about that. But I do think the paternalism is unwelcome and often unnecessary because people have learned for 30, 40 years that just 'cause it's written a book, it's not true. And just 'cause it's on the internet, it's not true. And just 'cause it's on social media, it's not true. There's no reason that this technology will be any different. And whatever we do here, I hope that we're generous in our assumption that consumers are smart and savvy and will know how to use the products and technologies with the appropriate level of caution.

Nathan Labenz: Maybe just give me your sort of medium or long-term vision for where this voice-enabled computing is gonna go. Is it gonna be her? We're all walking around with the AI in our earpiece, and we're untethered from our devices. Maybe we got glasses that pair with that. What's the tech optimist view of life in this voice-enabled computing future?

Anish Acharya: I think that we will see voice unlocked as a kind of modality feature on every product, in every interaction, in every device, so AirPods, glasses, your computer. As we've dove into voice, especially from a consumer use case, you find that there's a lot of situations where maybe you don't actually want to be having a two-way conversation, or you can't be having a two-way conversation. You want it to be transcribing what you say, or vice versa. You can't talk, and you want it to be talking back to you. And so I think right now, we're in inning one of AI voice, where we have a set of really compelling and exciting products. But five years from now, they're gonna look incredibly limited based on what we have then, where you can interact with voice in any way at any time, for whatever is most useful and helpful to you.

Olivia Moore: Steve Jobs famously said that a computer is a bicycle for your mind, and that meant that a computer extended us intellectually in ways that were unimaginable. And that's what technology has done for us for 40 years. I think we're now gonna have the emotional version of that, sort of emotional bicycle where it extends us emotionally through products like companionship, but many more. And I think voice is gonna be the primary catalyst and interface to that. So, maybe a subject for our next conversation, but I think that's really the way it's gonna impact us, and it's been a bit underestimated.

Nathan Labenz: Cool. I love it. Olivia Moore and Anish Acharya from a16z, thank you both for being part of the cognitive revolution.

Olivia Moore: Thank you.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to The Cognitive Revolution.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.