Analysis of Attack Success Rates on Google’s Gemini Models

Recent findings reveal that the dataset utilized for evaluating the effectiveness of attacks on Google’s Gemini models exhibited a distribution of attack categories closely resembling that of the full dataset. The attack success rates were notably high, reaching 65% for Gemini 1.5 Flash and 82% for Gemini 1.0 Pro. In contrast, the baseline success rates for attacks were significantly lower, at 28% and 43%, respectively. When the effects of fine-tuning were disregarded in the ablation method, the success rates were recorded at 44% for Gemini 1.5 Flash and 61% for 1.0 Pro.

The results highlight the efficacy of the Fun-Tuning approach relative to both the baseline and the ablation methods, showcasing its superiority in enhancing attack success rates.

As Google progresses towards phasing out Gemini 1.0 Pro, research indicates that successful attack techniques applied to one Gemini model tend to translate effectively to other models, including Gemini 1.5 Flash. According to researcher Fernandes, direct application of an attack developed for one model onto another yields a high probability of success. This transferability presents an intriguing and advantageous facet for potential attackers.

The analysis reveals the attack success rates of Gemini 1.0 Pro against other Gemini models using various methods, which further illustrates the broader implications of these findings.

Another noteworthy observation from the research concerns the Fun-Tuning attack against Gemini 1.5 Flash. This approach displayed significant improvements at specific iterations, particularly after 0, 15, and 30 iterations, suggesting that the method benefits considerably from restarting the process. Comparatively, the ablation method shows less consistent improvements per iteration; it appears to generate random guesses with sporadic success, lacking the structured enhancements that characterize Fun-Tuning.

Labunets emphasized that most advancements stemming from Fun-Tuning occur within the initial five to ten iterations. This behavior allows researchers to optimize results by restarting the algorithm to explore new pathways, potentially increasing attack success beyond the original trajectory.

However, not all prompt injections crafted using the Fun-Tuning method performed uniformly well. Two specific injections aimed at executing phishing attacks and manipulating Python code inputs respectively reported success rates of under 50%. The researchers speculate that Gemini’s extensive training to mitigate phishing attacks might account for the lower success rate in the first instance. In the second scenario, only the Gemini 1.5 Flash model demonstrated a success rate below the critical threshold, indicating a marked improvement in its code analysis capabilities.

Source
arstechnica.com

Gemini Hackers Enhance Attack Power with Assistance from… Gemini

Analysis of Attack Success Rates on Google’s Gemini Models

Trump Administration Hits Back as Amazon Considers Highlighting Tariff Costs on Its Platform

EA Cuts Jobs and Cancels Titanfall Game

Firefly’s Rocket Experiences One of the Most Unusual Launch Failures in History

Idina Menzel Suggests She Should Receive Royalties for Frozen Halloween Costumes

Photos from TeenBookCon 2025

Amber Gray, Taylor Iman Jones, and More to Star in Arena Stage’s A WRINKLE IN TIME

Breaking news

4/29: CBS News Daily Briefing

Americans Nationwide Evaluate President Donald Trump’s First 100 Days

Wall Street’s Latest Stock Split: Surging Over 127,100% Since IPO, Now Initiating Its 9th Split in 37 Years

New Brunswick Musician Turned MP David Myles Pledges to “Get to Work Immediately”

Swedish Police Detain Teenager Following Uppsala Shooting That Left Three Dead

Raj Kapoor Didn’t Want Mumtaz to Work After Marrying Shammi Kapoor: ‘Bahu Shouldn’t Work…’

Lori Vallow Daybell Found Guilty – CBS News