AI
AI

Uncanny AI Voice Demo Generates Both Awe and Unease Online

Photo credit: arstechnica.com

Gavin Purcell, co-host of the AI for Humans podcast, shared an intriguing video on Reddit showcasing an interaction between a human player posing as an embezzler and an AI-driven boss. The exchange is so convincing that it blurs the line between the human participant and the AI, demonstrating the advanced capabilities of this technology.

Achieving “Near-Human Quality”

At its core, Sesame’s Conversation Simulation Model (CSM) merges two sophisticated AI systems, a backbone and a decoder, crafted on Meta’s Llama framework. This setup processes text and audio in tandem, allowing for incredibly realistic speech synthesis. The models vary in scale, with the most complex version scaling up to 8.3 billion parameters, leveraging nearly one million hours of predominantly English audio data to enhance its performance.

Differentiating itself from conventional text-to-speech mechanisms, which typically operate in a two-step manner—first generating semantic concepts and then acoustic details—Sesame’s CSM employs a unified, single-stage process. This multimodal transformer approach allows it to handle text and audio inputs simultaneously, paralleling techniques seen in OpenAI’s voice technology.

In evaluations devoid of conversational context, human judges found it challenging to distinguish between audio generated by the CSM and natural human speech. The results indicated that the model is nearly indistinguishable in isolated speech scenarios. However, when speakers engaged in conversations, evaluators consistently favored human voices, pointing to a significant area for further development in contextual understanding and delivery.

Brendan Iribe, co-founder of Sesame, openly discussed the model’s limitations during a discussion on Hacker News. He pointed out that the AI often exhibits inappropriate tone and pacing, along with challenges in managing interruptions and the overall flow of conversation. “Today, we’re firmly in the valley, but we’re optimistic we can climb out,” he expressed, highlighting a commitment to overcoming these challenges and improving the technology.

Source
arstechnica.com

Related by category

Automakers Struggle to Understand the Impact of Trump’s Tariffs

Photo credit: www.theverge.com The Auto Industry Faces Uncertainty Amid Shifting...

Are Chatbot Responses Considered Protected Speech? Court Under Pressure for Clarity.

Photo credit: arstechnica.com Character Technologies maintains that updating safety protocols...

Warning Systems for Floods, Hurricanes, and Famine Are Hampered by Donald Trump’s Data Purge

Photo credit: www.theverge.com Shortly after President Trump took office, critical...

Latest news

Complete Guide to All Gundam Wing Skins in Overwatch 2 and How to Unlock Them

Photo credit: dotesports.com This marks a momentous crossover for fans...

Providing Digital Safety Resources for Domestic Violence Survivors (Viewpoint)

Photo credit: www.yahoo.com Years ago, during my tenure at AT&T,...

Sunrise on the Reaping: Plot Details, Cast, and Release Date

Photo credit: movieweb.com Mere hours after Suzanne Collins revealed her...

Breaking news