Photo credit: www.smithsonianmag.com
An expert on the impacts of information technologies on society considers how talking machines got their male- and female-sounding voices
April 9, 2025 10:43 a.m.
As technology advances, a recurring theme emerges in discussions about voice assistants: the gendering of their voices, particularly the prevalence of female-sounding options. This issue invites us to consider the historical processes and societal influences that have shaped these choices.
Picture a pilot navigating a fighter jet in a crucial moment—an altimeter warning signals a rapid descent, and the cockpit, filled with complex instruments, becomes a blur. Amidst the chaos, a calm, firm voice breaks through: “Pull up … Pull up … Pull up.” Its familiar quality provides a sense of reassurance, prompting the pilot to act swiftly and avert disaster.
In the late 1970s, engineers at McDonnell Douglas were developing the F-15 Eagle fighter jet and noticed that traditional alarm systems—characterized by lights and bells—failed to elicit quick reactions from pilots. They determined that a vocal alert system would serve better, as a human voice could effectively convey urgency and clarity. Previous systems had relied on recorded messages, but advances in voice synthesis technology promised a more efficient and reliable solution.
Reports suggest that engineers opted for a female voice, believing it would be more attention-grabbing for predominantly male pilots. A young actress, Kim Crow, was chosen to voice the warnings, leading to the now-iconic nickname “Bitching Betty,” first used during pilot feedback. The term reflects a complicated mixture of admiration and unintended condescension in the context of life-saving technology.
The moniker “Betty” harks back to the cultural representation of women, with roots in the cartoon character Betty Rubble from “The Flintstones.” Paradoxically, there were also male-voiced systems referred to as “Barking Bob,” underscoring the inconsistent and often dismissive portrayal of gender in technological contexts. While “Bitching Betty” might seem derogatory at first glance, some pilots have defended the title as an affectionate acknowledgment of the voice that delivers critical safety information.
Prior to the 1980s, commercially available synthesized voices tended to sound male, with studies showing that early voice synthesis techniques did not effectively replicate female voice qualities. Early research efforts demonstrated a significant gap in understanding the acoustic properties of female speech. Interestingly, these technical setbacks also reflected broader societal biases; much of the research and development in voice technology focused on male-oriented models, which marginalized female vocality as inferior or unsuited for syntheses.
The discourse surrounding the inadequacy of female-sounding voices has often been framed by existing stereotypes, as technical limitations being attributed to the voices themselves rather than the underlying systems. A review of phonetic literature indicated that female speakers were underrepresented in studies, perpetuating a cycle of bias in technological development.
This situation echoes earlier claims that women’s voices were not compatible with recording technologies. The gradual ubiquity of female voices in public contexts—such as transport announcements—was intended to create a seamless and efficient urban experience. Notably, this phenomenon has been termed “soft coercion,” highlighting how these voices guide public behavior and expectations.
In November 1983, sociologist Steven Leveen raised alarms about “technosexism” in an editorial for the New York Times. He argued that the proliferation of speaking devices was reinforcing gender stereotypes by relegating female voices to lower-status interactions. Leveen’s observations included data suggesting that producing a female-sounding voice was often more resource-intensive, yet product developers opted for these choices based on research that focused primarily on male preferences.
This concept of “technosexism” continues to recur in modern debates surrounding voice assistants such as Siri and Alexa, which often default to female voices. Leveen noted how the preference for female-sounding voices in low-status applications reinforces harmful societal beliefs, particularly in shaping children’s perceptions of gender roles.
Clifford Nass, a distinguished researcher in human-computer interaction, continued to explore these themes through his studies during the late 20th century. His experiments suggested that people respond to voice interfaces as they would to human interactions, yet his research framework oversimplified the complex interplay between culture and established gender biases. Although he recognized biases in voice perception, he attributed them to cultural conditioning rather than considering the responsibility of technology developers in perpetuating such stereotypes.
As technology innovators like Caroline Henton argued for the improvement of female-sounding synthesized voices, this highlighted a path toward challenging existing listener prejudices. Henton emphasized that the quality of female voices could evolve rapidly through advancements in technology and data collection, a vision she later pursued while contributing to Apple’s development of Siri.
At Bell Labs, Ann K. Syrdal was also pioneering efforts to create high-fidelity female voice synthesis, ultimately leading to the development of an acclaimed synthesized voice, Julia, which embodied more natural qualities and won recognition in 1998.
Despite ongoing criticisms of female-sounding voice assistants as perpetuating outdated stereotypes, recent trends indicate a shift toward various gendered voices for technology applications. Companies are experimenting with different vocal characteristics and no longer defaulting solely to female voices. Yet issues of representation and gender dynamics in voice technology remain prominent, as many firms balance commercial choices with societal expectations.
This exploration of gendered voices in technology poses important questions about the future of human-machine interactions. A voice, whether designed to appear nurturing or commanding, ultimately reflects corporate interests and the power dynamics they represent. The challenge lies in recognizing the cultural implications of these choices and striving for a technological landscape that embraces diversity and fosters genuine human connection.
The evolving relationship between synthesized voices and societal perception highlights a vital need for awareness in technology design. As synthesized voices replace personal interactions, a risk exists in diminishing compassion and understanding, pressing the need for diverse representation and emotional nuance in our technological companions.
Sarah A. Bell is a researcher focused on the intersection of information technology and society. She has authored Vox ex Machina: A Cultural History of Talking Machines, from which this article is adapted. An open access edition of the book is available for download here.
Source
www.smithsonianmag.com