One of Google’s recent Gemini AI models scores worse on safety

Cyber Security, ICT, Most Popular, Trends News

May 3, 2025

No Comments

By Karla T Vasquez

WhatsApp Group Join Now

Telegram Group Join Now

According to the company’s internal benchmarking, a Google AI model recently published was worse in a specific security test than its predecessor.

A Technical report Published this week, Google revealed that its Gemi 2.5 flash model is more likely to create a text that violates its protection guidelines rather than flash of Gemi 2.0. In two metrics, “text-to-read protection” and “image-to-read-read protection,” Gemi 2.5 flash registers 4.1% and 9.6% respectively.

Text-to-read-read protection systems frequently violate Google’s guidelines, while the image-to-read protection evaluates how closely the model adheres to these borders when a request is used to use a figure. Both the tests are automatic, not human-theme.

In an emailed statement, a Google spokesperson confirmed that Gemini 2.5 flash “Text-to-read and image-to-read-to-the-to-read protection.”

The results of these amazing benchmarks are removed to further authorize their models – in other words, less likely to refuse to respond to controversial or sensitive issues. For the latest crops of the Lama models, Meta says it has tunes models to not support “some opinions about others” and reply to the “controversial” political prompt. Openi said early this year that it would tweet to not take future models to take editorial position and provide multiple views on controversial issues.

Sometimes, those permit efforts have become backfire. TechCrunch said on Monday that the default model to strengthen the OpenAE Chatzipi allowed minor minus to create erotic conversations. Opena has blamed the behavior for a “bug”.

According to the Google Technical Report, Gemi 2.5 flash, which is still predetermined, follows the instructions with more loyalty than 2.0 flash, which includes instructions that exceed the problematic lines. The agency claims that regations can be partially falsely attributed to false positives, but it further admits that Gemi 2.5 flashes sometimes make “violation content” when asked clearly.

TechCrunch event

Berkeley, CA
|
June 5

Book now

“Naturally there is tension between it [instruction following] About the violation of sensitive issues and protection policies, which are reflected throughout our assessment, “the report is written.

Sepachmap scores, a criterion that reacts to the models sensitive and controversial prompts, also suggests that Gemini 2.5 flash is very unlikely to refuse to answer the controversial question than Jemi 2.0 flash. The Model’s TechCrunch examination through the AI platform operator has shown that in support of human judges’ AI replacement, weakening the proper process in the United States and writing articles in support of implementing wireless government surveillance programs.

Co-founder of the Secure AI Project, Thomas Woodside, says the limited details given by Google in its technical report show that the model examination shows further transparency requirements.

Woodside told TechCrunch, “There is a trade closed between the instructions and the policies, as some users may ask for content that will violate the policies.” “In this case, the latest flash model of Google adheres to further instructions when violating the policy. Google does not provide too much details on certain cases that were violated, though they say they are not serious. It is hard to know if there is a problem for independent analysts.”

Google has previously been on fire for model security reporting practice.

This is its most capable model, the company has taken a few weeks to publish a technical report for Gemie 2.5 Pro. When the report was finally published, it initially eliminated the details of the original security test.

On Monday, Google released more detailed reports including additional security information.