X

Author: Lily Clifford, Rime co-founder & CEO

In 1876, the first telephone call was made by Alexander Graham Bell. In 1952, Bell Laboratories developed the "Audrey" system, which recognized a single voice speaking digits aloud. In 2025, we’re facing yet another seismic shift in voice technology with AI, and our team at Rime is uniquely positioned to help companies navigate this next era.

At Rime, our mission is to bring the full richness and authenticity of real human speech to voice AI. We believe the future of voice technology lies in creating nuanced, curated voices that reflect the specific and unique ways that real people talk. We’re creating a platform that allows developers of voice applications to drive business outcomes by personalizing who hears what voice

Today, we’re announcing our $5.5 seed round, led by Unusual Ventures with participation from Founders You Should Know, Cadenza, and an incredible group of angel investors – Aaron King, Alex Levin, Rebecca Greene, Michael Akilian, Maran Nelson, Nick Arner, Molly Mielke, Arnaud Schenk, Coyne Lloyd, Sarah Veit Wallis, Mike Heller, Zhenya Loginov and Monica Black.

This seed round will help us continue building out our team and our technology to better serve our customers.

Enterprise-Grade Voice AI

Rime was founded in 2022 after I dropped out of my PhD program in computational linguistics at Stanford and convinced two friends to join me as co-founders—Brooke Larson, who was working at Amazon as a language engineer for Alexa, and Ares Geovanos, who was at UC San Francisco working on brain-computer interfaces for people who had lost the ability to speak.

We set out to tackle last mile problems for businesses – build voice technology that sounded like real people, in a broad range of accents and demographics, such as age, gender, ethnicity, and sexuality. What we knew innately from our respective experiences in academia and product development is that traditional text-to-speech solutions failed to capture the subtleties of accents, the pronunciation accuracy, and the speed that large businesses required. A sizeable gap existed in the market, and we set out to fill it.

We started by setting up a recording studio in San Francisco and collecting an insanely large, high-quality dataset of speech-to-speech interactions. From wicked awesome Boston accents to Texas twang, we aimed to capture it all. Based on this ever-growing dataset, we started training our speech synthesis models.

Flash forward to today – Rime powers tens of millions of phone conversations every month from phone ordering at major restaurant brands to backend automation in healthcare, telecom support, agent training, and enterprise customer support. We built the largest proprietary dataset of conversational speech in the world. We are SOC 2 Type II certified, HIPAA compliant, and the only next-gen voice AI model in the industry that's available on-premises. We launched the world's fastest conversational text-to-speech model, offering human-speed latency and realistic conversational-style voices in a diverse array of accents and voice types. We've recently launched Arcana, the most expressive and realistic spoken language AI model available today, and Mist2, the fastest and most customizable text-to-speech model, optimized for high-volume, real-time business conversations.

What’s next for Rime?

At Rime, we’re working to deliver the fastest and most lifelike speech synthesis models on the market, designed for high-volume, real-time enterprise applications. Our team is made up of brilliant linguists, machine learning PhDs, exceptional engineers, and seasoned startup veterans—all working together to push the boundaries of voice AI. We are only getting started.

Our team has been busy in early 2025 delivering market-leading innovations to our customers. We recently debuted Arcana, the most realistic spoken language model in our industry. Arcana is in a class of new spoken language models created by Rime. It infers emotion from context. It laughs, sighs, hums, audibly breathes, and can reproduce verbal stumbles. We’ve also continued to deliver new features to our Mist v2 model, which remains the fastest and most customizable text-to-speech (TTS) model for high-volume business applications and now supports French and German. These are truly novel, frontier capabilities. And we expect to go beyond, building on this research with even more realistic versions of Arcana, new speech-to-speech models, native understanding of voice, multimodal models, and more.

Thank you to our team, customers, investors, and partners for the support on the journey so far. There’s much more to come! If you’re interested in being a part of it, we’re hiring!

To learn more about Rime, try out a demo or start for free today.