开云登陆入口-开云online(中国)

谷歌发布大模型安全风险报告，但专家表示“令人担忧”

Jeremy Kahn，Beatrice Nolan

2025-04-21

在发布Gemini 2.5 Pro人工智能模型几周后，谷歌发布了一份重要文件，详细披露了这个模型的能力和风险。

文本设置

小号

默认

大号

Plus(0条)

一位人工智能治理专家认为，这份报告提供的信息“严重不足”且“令人担忧”。图片来源：Photo Iilustration by Thomas Fuller/SOPA Images/LightRocket via Getty Images

在发布最新人工智能模型Gemini 2.5 Pro“预览版”三周后，谷歌（Google）发布了一份重要文件，详细披露了这个模型的构建与测试信息。

此前，人工智能治理专家批评该公司在发布模型时未同步公开对该模型的详细安全评估报告及其潜在风险说明文件，这显然违背了其向美国政府及在多次国际人工智能安全会议上做出的承诺。

谷歌发言人在邮件声明中表示，任何关于公司违背承诺的说法都是“不实之词”。

该公司还表示，当向公众完全开放Gemini 2.5 Pro“系列模型”最终版时，将提供更为详尽的“技术报告”。

但至少有一位人工智能治理专家批评谷歌新发布的六页模型卡文件，认为文件提供的模型安全评估信息“严重不足”。

位于华盛顿特区的智库民主与技术中心（Center for Democracy and Technology）的人工智能治理高级顾问凯文·班克斯顿在社交媒体平台X发表长篇评论，对谷歌延迟发布模型卡文件及其细节缺失表示担忧。

他指出：“这份有关顶级人工智能模型的资料信息严重不足，它揭示了一个令人不安的趋势——当科技公司争相将产品推向市场时，人工智能的安全性和透明度正在竞相降低。”

班克斯顿特别强调，模型卡既未及时发布，又缺乏关键安全评估结果（如诱使人工智能模型输出生物武器指南等危险内容的“红队测试”细节），表明谷歌“在发布最强模型前尚未完成安全测试”，且“甚至至今仍未完成安全测试”。

班克斯顿提出另一种可能：谷歌虽完成了安全测试，但制定了新政策，即在模型未向全体用户开放前不公开评估结果。目前，谷歌将Gemini 2.5 Pro称为“预览”版本，用户通过谷歌AI Studio和Google Labs产品访问，但可使用的功能受到限制。谷歌同时宣布将向美国大学生广泛开放该模型。

谷歌发言人表示公司将为“每一个模型系列发布一份”更加完整的人工智能安全报告。班克斯顿在X平台上表示，这可能意味着谷歌未来不会针对模型的微调版本（如专门针对编程或网络安全的改进版）单独发布评估报告。他指出这种做法的危险性，因为人工智能模型的微调版本可能展现出与“基础模型”截然不同的行为特征。

在人工智能安全性方面的表现出现倒退趋势的并非谷歌一家公司。Meta为新发布的Llama 4人工智能模型提供的模型卡文件，在篇幅和细节方面与谷歌Gemini 2.5 Pro的模型卡文件相似，同样遭到人工智能安全专家的批评。OpenAI则表示不会为其新模型GPT-4.1发布技术安全报告，理由是该模型“并非前沿产品”，因其o3和o4-mini等“思维链”推理模型在多项基准测试中表现更优。与此同时，OpenAI宣称其GPT-4.1模型的性能比GPT-4o模型更强大。尽管安全评估显示后者可能会带来一定的风险，但该公司称其安全性仍在可发布的阈值以内。鉴于该公司表示不会发布技术报告，GPT-4.1是否会突破安全阈值目前仍是未知数。

OpenAI确实为上周三新发布的o3和o4-mini模型提供了技术安全报告。但与此同时，本周早些时候，该公司更新了“准备框架”。“准备框架”介绍了该公司如何评估其人工智能模型存在的关键风险以及计划如何减缓这些风险，如协助制造生物武器和模型开始自主改进并突破人类控制的可能性等风险。此次更新将“说服能力”（即模型操纵人类采取危险行动或相信虚假信息的能力）剔除出该公司预发布评估的风险类别，并修改了发布高风险模型的决策机制，包括在竞争对手已推出类似模型时，公司会考虑发布存在“重大风险”的人工智能产品。

这些修改在人工智能治理专家之间引发了分歧：部分人士赞赏OpenAI提升决策透明度并明确发布政策的做法，另一些专家则对修改内容表示震惊。（财富中文网）

译者：刘进龙

审校：汪皓

在发布最新人工智能模型Gemini 2.5 Pro“预览版”三周后，谷歌（Google）发布了一份重要文件，详细披露了这个模型的构建与测试信息。

谷歌发言人在邮件声明中表示，任何关于公司违背承诺的说法都是“不实之词”。

该公司还表示，当向公众完全开放Gemini 2.5 Pro“系列模型”最终版时，将提供更为详尽的“技术报告”。

但至少有一位人工智能治理专家批评谷歌新发布的六页模型卡文件，认为文件提供的模型安全评估信息“严重不足”。

译者：刘进龙

审校：汪皓

Google has released a key document detailing some information about how its latest AI model, Gemini 2.5 Pro, was built and tested, three weeks after it first made that model publicly available as a “preview” version.

AI governance experts had criticized the company for releasing the model without publishing documentation detailing safety evaluations it had carried out and any risks the model might present, in apparent violation of promises it had made to the U.S. government and at multiple international AI safety gatherings.

A Google spokesperson said in an emailed statement that any suggestion that the company had reneged on its commitments was “inaccurate.”

The company also said that a more detailed “technical report” will come later when it makes a final version of the Gemini 2.5 Pro “model family” fully available to the public.

But the newly published six-page model card has also been faulted by at least one AI governance expert for providing “meager” information about the safety evaluations of the model.

Kevin Bankston, a senior advisor on AI Governance at the Center for Democracy and Technology, a Washington, D.C.-based think tank, said in a lengthy thread on social media platform X that the late release of the model card and its lack of detail was worrisome.

“This meager documentation for Google’s top AI model tells a troubling story of a race to the bottom on AI safety and transparency as companies rush their models to market,” he said.

He said the late release of the model card and its lack key safety evaluation results—for instance, details of “red-teaming” tests to trick the AI model into serving up dangerous outputs like bioweapon instructions—suggested that Google “hadn’t finished its safety testing before releasing its most powerful model” and that “it still hasn’t completed that testing even now.”

Bankston said another possibility is that Google had finished its safety testing but has a new policy that it will not release its evaluation results until the model is released to all Google users. Currently, Google is calling Gemini 2.5 Pro a “preview,” which can be accessed through its Google AI Studio and Google Labs products, with some limitations on what users can do with it. Google has also said it is making the model widely available to U.S. college students.

The Google spokesperson said the company would release a more complete AI safety report “once per model family.” Bankson said on X that this might mean Google would no longer release separate evaluation results for fine-tuned versions of its models that it releases, such as those that have been tailored for coding or cybersecurity. This could be dangerous, he noted, because fine-tuned versions of AI models can exhibit behaviors that are markedly different from the “base model” from which they’ve been adapted.

Google is not the only AI company seemingly retreating on AI safety. Meta’s model card for its newly released Llama 4 AI model is of similar length and detail to the one Google just published for Gemini 2.5 Pro and was also criticized by AI safety experts. OpenAI said it was not releasing a technical safety report for its newly-released GPT-4.1 model because it said that the model was “not a frontier model,” since the company’s “chain of thought” reasoning models, such as o3 and o4-mini, beat it on many benchmarks. At the same time, OpenAI touted that GPT-4.1 was more capable than its GPT-4o model, whose safety evaluation had shown that model could pose certain risks, although it had said these were below the threshold at which the model would be considered unsafe to release. Whether GPT-4.1 might now exceed those thresholds is unknown, since OpenAI said it does not plan to publish a technical report.

OpenAI did publish a technical safety report for its new o3 and o4-mini models, which were released on Wednesday. But at the same time, earlier this week it updated its “Preparedness Framework” which describes how the company will evaluate its AI models for critical dangers—everything from helping someone build a biological weapon to the possibility that a model will begin to self-improve and escape human control—and seek to mitigate those risks. The update eliminated “Persuasion”—a model’s ability to manipulate a person into taking a harmful action or convince them to believe in misinformation—as a risk category that the company would assess during it pre-release evaluations. It also changed how the company would make decisions around releasing higher risk models, including saying the company would consider shipping an AI model that posed a “critical risk” if a competitor had already debuted a similar model.

Those changes divided opinion among AI governance experts, with some praising OpenAI for being transparent about its process and also providing better clarity around its release policies, while others were alarmed at the changes.

财富中文网所刊载内容之知识产权为开云登陆入口及/或相关权利人专属所有或持有。未经许可，禁止进行转载、摘编、复制及建立镜像等任何使用。

0条Plus

精彩评论

撰写或查看更多评论

请打开财富Plus APP

前往打开

热读文章

关注我们

谷歌发布大模型安全风险报告，但专家表示“令人担忧”

撰写或查看更多评论