DeepL Launches Voice-to-Voice Translation Suite
DeepL, a company renowned for its advanced text translation services, has now ventured into the realm of voice translation. The company recently unveiled a comprehensive voice-to-voice translation suite designed to cater to a variety of communication scenarios. These include virtual meetings, mobile and web conversations, and group discussions for frontline workers, accessible through custom applications.
In addition to its end-user focused products, DeepL is also releasing an API. This API will empower external developers and businesses to leverage DeepL's core technology for building bespoke translation solutions, such as those needed in call centers. This move represents a significant expansion for the company, building upon its established expertise in text-based translation.
The Natural Evolution to Voice
DeepL CEO Jarek Kutylowski explained the strategic direction, stating, "After spending so many years in text translation, voice was a natural step for us." He elaborated, "We have come a long way when it comes to text translation and document translation. But we thought there wasn't a great product for real-time voice translation." The company recognized a gap in the market for high-quality, real-time voice translation tools.
Addressing Real-Time Translation Challenges
Kutylowski highlighted the core technical challenges in developing real-time voice translation. The primary hurdle lies in achieving a delicate balance between minimizing latency, which is the delay between speaking and hearing the translated audio, and ensuring the accuracy of the translation. DeepL's approach aims to overcome these obstacles to provide a seamless communication experience.
Product Offerings and Early Access
DeepL is rolling out add-ons for popular collaboration platforms like Zoom and Microsoft Teams. These integrations will allow users to receive real-time translated audio while others speak in their native languages, or to follow along with live translated text displayed on screen. This feature is currently in early access, and DeepL is inviting organizations to join a waitlist to experience the technology.
Beyond meeting platforms, DeepL offers a solution for mobile and web-based conversations, suitable for both in-person and remote interactions. The suite also facilitates group conversations in settings like training sessions or workshops, enabling participants to join the translated discussion via a simple QR code.
Customization and Adaptive Learning
A key capability of DeepL's voice-to-voice technology is its ability to learn and adapt to specialized vocabulary. This includes industry-specific terminology, as well as company and personal names, ensuring greater accuracy and relevance in translations.
The Future of Customer Service with AI
Kutylowski views artificial intelligence, particularly through translation technologies, as a transformative force in customer service. He noted that a robust translation layer can significantly enhance a company's ability to provide support in languages where finding qualified staff can be both difficult and costly.
Technological Approach and Future Development
DeepL asserts that it manages the entire voice-to-voice translation process. The current system operates by converting speech to text, performing the translation, and then converting the translated text back into speech. The company believes its extensive experience in text translation gives it an advantage in delivering superior translation quality.
Looking ahead, DeepL aims to develop an end-to-end voice translation model that bypasses the text conversion step entirely, potentially further reducing latency and enhancing naturalness.
Competitive Landscape
DeepL enters a competitive market with several well-funded startups focusing on related areas. Sanas, for instance, uses AI to modify a speaker's accent in real time, a tool primarily targeted at call center agents. Camb.AI, based in Dubai, specializes in speech synthesis and translation for the media and entertainment industries, aiding in the large-scale dubbing and localization of video content.
Palabra, supported by Alexis Ohanian's firm Seven Seven Six, is developing a real-time speech translation engine that aims to preserve both the meaning and the original speaker's voice, placing it in direct competition with DeepL's new offering.
Stay Tuned to Devignitor Insights for More Updates