We just rolled out several small but impactful updates to our new Arcana text-to-speech (TTS) model, enhancing usability and paving the way for more improvements on the horizon.
♾️ Unlimited Text Generation
The removal of character limits for Arcana text input enables the generation of longer speech passages without constraints, facilitating more comprehensive voice outputs and unlimited streaming applications. More info on our arcana docs.
🎧 Expanded Audio Format Support
Arcana now accommodates additional audio formats, including WAV, PCM, and MULAW, alongside MP3, which we launched with. This expansion offers developers greater flexibility in integrating Rime's TTS capabilities across various applications. More info on our audio format docs.
🎚️ Adjustable Sampling Rates
Developers can now fine-tune the sampling rate of the generated audio, allowing for optimized audio quality tailored to specific use cases. More info on our api parameter docs.
📝 Enhanced Text Normalization
Mist v2 has had the best text normalization for TTS for some time now. Adding these capabilities to Arcana ensures that numerical values, abbreviations, and other textual elements are accurately and consistently converted into spoken language, enhancing the naturalness of synthesized speech. More info on our text normalization docs.
🔮 Upcoming Enhancements
Looking ahead, Rime is set to introduce additional features from Mist v2 to our Arcana model, including the spell()
function. This function will enable the spelling out of words, names, IDs, or emails, enhancing clarity in specific contexts such as customer service or educational applications.
These updates reflect Rime Labs' commitment to advancing TTS technology, offering developers powerful tools to create more engaging and human-like voice interactions. To explore these features and integrate them into your projects, visit the Rime documentation.