On Tuesday, Amazon made a new generator AI model, Nova Sonali, was able to create voice processing and natural-sounding speech locally. Amazon has claimed that Sonali’s performance is competitive with OpenAI and Google’s border voice models in benches in the quality of the conversation.
Nova Sonali is the voice mode of the new AII voice models such as model of Amazon, such as modeling Chatzipt’s voice mode, which seems more normal to speak than the more rigid models of Amazon Alexa’s first days. Recent technical epochs have created inheritance models and compares digital assistants like Alexa and Apple Siri seem to be incredibly suspended.
Nova Sonik is available through Amazon’s developer platform bedrock for Enterprise AI apps via a new two-directional streaming API. In a press release, Amazon Nova Sonik called “AI Voice Model” AI Voice Model and Opena’s GPT -4o about 80% less expensive than “AI Voice Model” in the market.
Amazon SVP and AGI Rohit Prasad chief scientist said that the elements of Nova Sonali are already strengthening Alexa+, Amazon’s upgraded digital voice assistant.
In an interview, Prasad told TechCrunch that Nova Sonali created on “large orchestration systems” based on Amazon’s skills, which is the technical scaffolding that produces Alexa. Compared to the rival AI voice models, Nova Sonik exceeds the user’s requests in various APIs, Prasad said. This power helps Nova Sonali “to know” when it is to bring real-time information from the Internet, pars the owned data source, or take action on an external application-and use the appropriate tool to do so.
During the bilateral conversation, Nova Sonali Speaker’s breaks and obstacles are considering “to speak at the appropriate time”, say Amazon. It also produces a text transcript for user speech, which developers can use for different applications.
According to Prasad, Nova Sonali recognizes speech recognition compared to other AI voice models, which means that the model is relatively good in understanding the user’s intention, even if they are in the wrong spelling or noise setting. On a standard of measurement of speech recognition across the language and dialects, the multilingual Libripich, Amazon says that Nova Sonali achieved a noise error rate (WR) on only 4.2% on average throughout English, French, Italian, German and Spanish. This means that about four out of every 100 words from the model are different from one human transcript of those languages.
In another criterion with higher interactions with multiple participants, Amazon has said that Amazon has said that Nova Sonali was 46.7% accurate than the GPT -4 -Transcription model of OpenAE. According to Amazon, Nova Sonali also has the top speed of the art, with an average of 1.09 seconds delayed on average. It makes the GPT -4O model OpenAE’s realtime faster than powering, which responds to the benchmarking by artificial analysis 1.18 seconds.
Prasad says that the Nova Sonali defines a part of Amazon’s broad strategy for creating AGI (Artificial General Intelligence), which the company defines “AI systems that can do something on the computer”. “Going forward, Prasad says that Amazon has planned to publish more AI models that understand various methods including images, videos and voice, as well as” other sensitive data that you bring to the physical world but are relevant. “
Amazon’s AGI section, which is overseeing Prasad, seems to be playing a bigger role in the company’s product strategy nowadays. Just last week, the Amazon launched a preview of the Nova Act, a browser-user AI model that seems to strengthen my features for Alexa+ and Amazon. Starting with Nova Sonali, Prasad says that the company wants to offer more internal AI models to make with developers.
