How do you trust what's coming out of your TTS engine?
We hear two things from our customers, either they are dedicating their own resources to actually listening to hundreds of calls, or they end up getting complaints from their own users about something that is off.
Rime QA gives your team one simple hub to keep everything running smoothly, no more slogging through endless call recordings.
Observability is one of those terms people in AI and speech technology love to talk about. But how many teams are actually doing observability in a meaningful way? Most of the time, it stops at logging outputs and running a few manual checks.
We wanted to change that. So we built a real quality assurance tool for TTS: one that lets you catch mispronunciations, monitor new vocabulary, and ensure your speech system works reliably in production.
This tool came directly from customer demand. We were hearing constant requests to add words to our dictionary and provide linguistic support. Instead of forcing teams to manually chase every mispronunciation, we built something scalable.
Why We Built This
Most customers want perfect pronunciation out of the box, and while we get very close, sometimes you need that extra bit of visibility.
Today, most companies that care about TTS quality rely on slow, manual processes. A QA professional listens to hours of calls, combs through logs, flags errors, and then escalates them to other team members or tries to fix them by hand. It’s resource-intensive and never guarantees full coverage. Even when they catch something, no other TTS vendor has the means to actually address it without clunky client-side fixes.
We wanted to remove that burden entirely. Our Quality Assurance tool automates observability. It surfaces out-of-dictionary words directly in a dashboard so your team doesn’t have to go hunting for them. All these words are in one place and QA teams can directly review pronunciations and request corrections.
How to Use the Tool
Read our docs for full details. Using the tool is simple:
By default, Rime has a zero data retention policy. We don’t store your calls or requests.
If you want us to monitor requests for Speech QA OOV (out-of-vocabulary) terms, just include an extra parameter when calling the API.
Navigate over to the Realtime monitoring dashboard and review what's popping up there.
This feature is currently available for Mist, our phoneme-based TTS model, and Arcana and on prem access coming in product followups.
For proactive checks before you're live in production, head over to our check coverage page.
Here’s the script:
curl -X POST "http://users.rime.ai/v1/rime-tts" \-H "Authorization: Bearer $RIME_API_KEY" \-H "Accept: audio/mp3" \-H "Content-Type: application/json" \-d '{"modelId": "mistv2", "text": "we found something in the interphalangeal joint.", "speaker": "cove", "saveOovs": true}' \-o speech_with_oov.mp3
Once the saveOovs
parameter is true, any word not already in our dictionary will be flagged as Speech QA OOV. You’ll see these directly in the dashboard where you can:
Listen to how they’re pronounced
Review accuracy
Approve or reject entries
Request corrections that get added to the dictionary by our linguists.
Instead of QA team members listening to hours of calls to find a single mistake, you get a direct view into pronunciation quality.
How It Works Under the Hood
Mist2 is a phoneme-based model. Every word is mapped to a phoneme sequence that dictates pronunciation. If a word isn’t in the dictionary, our grapheme-to-phoneme prediction model steps in. Most of the time it’s accurate, but with domain-specific terms (medical, geographic, product names) there’s always risk of incorrect pronunciations.
Rime’s new QA tool catches these cases, surfaces them, and routes them through our review pipeline so you don’t have to worry about gaps. Once they’re in the dictionary, you get guaranteed accuracy across all speakers.
Real-World Applications
We’re already using this tool in high-stakes scenarios where pronunciation matters:
Medical terms: Ensuring clarity in clinical and pharmaceutical contexts.
Large menus: Onboarding restaurants with thousands of unique items.
Street and city names: Handling geographic coverage across diverse regions.
Unique acronyms or text formatting: Standardizing tricky formatting cases.
Proper nouns in foreign languages: Names, locations, places, organizations
Brand specific names and products: Maintaining consistency across products and proprietary language
Wherever accurate speech output is critical, Speech QA monitoring saves time and improves trust.
Try It Out Today
The tool is live and ready for you to try. Just add the saveOovs
parameter to your API calls, and you’ll start seeing insights in your dashboard.
If you have any questions or want to see a demo, email us at support@rime.ai