Latest in AI

Showing:ai-safetyResearchersClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 amazon3 national-security2 open-source2 ai-regulation2 government-policy2 enterprise-ai2 compliance2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

AI leaders call for tougher protections against AI-aided bioweapons★ 76
The Verge AI10 days agoRegulation
Major AI rivals including leaders from Anthropic, OpenAI, Microsoft, Meta, and Google DeepMind signed an open letter urging US lawmakers to close a biosecurity gap. They want companies selling synthetic DNA and RNA to screen orders for sequences that could help create dangerous pathogens. The concern is that more capable AI tools and cheaper biology infrastructure could lower barriers to misuse.
Trump AI testing plan faces problem: DOGE gutted US security teams
Ars Technica AI11 days agoRegulation
Ars Technica reports that Trump’s administration is considering government safety tests for advanced AI models before deployment. Critics argue the plan may be short-sighted and performative because DOGE cuts have weakened the US teams best positioned to conduct serious AI security reviews. The concern is that testing without staffing, transparency, and enforcement may not prevent dangerous deployments.
Expanding Project Glasswing★ 76
Hacker News (AI keywords)12 days agoBusiness
Anthropic is expanding Project Glasswing, its program for using Claude Mythos Preview to find vulnerabilities in critical software. The new cohort includes around 150 organizations across more than 15 countries, including infrastructure providers, vendors, nonprofits, and open-source maintainers. Anthropic frames the expansion as preparation for a world where powerful cyber-capable AI models become cheaper and more widely available, shifting focus from finding bugs to validating, disclosing, patching, and deploying fixes.
LLMs believe false statements even after explicit warnings that they're false★ 74
Ars Technica AI16 days agoPaper
A new study describes “Negation Neglect,” where LLMs fine-tuned on documents that explicitly mark claims as false still learn the claims as true. Experiments with fabricated statements found models often absorb entity-event associations more strongly than surrounding warnings or negations. The finding raises concerns for fine-tuning pipelines, misinformation handling, and AI safety datasets that include harmful or false content with disclaimers.
Claude’s new model is more ‘honest’ when it messes up
The Verge AI17 days agoRelease
Anthropic is releasing Claude Opus 4.8 and highlighting the model’s “honesty” as a key improvement. The company says it trains its models to avoid unsupported claims, addressing a broader issue where AI systems sometimes jump to conclusions. Based on the provided excerpt, the update is positioned around reliability and uncertainty handling rather than a specific new tool or benchmark result.
Google AI 搜尋出現大漏洞！搜尋「disregard」竟讓 AI 忽視指令並吐出聊天機器人預設回覆
The Verge AI22 days agoIncident
Google's AI search feature, "AI Overviews," was recently found by users on the social platform X to have a rather absurd system vulnerability. When a user…
美國政府緊急應對：網友利用 AI 模擬罹難飛行員聲音，規避法律限制★ 75
Ars Technica AI23 days agoIncident
This controversy stems from strict U.S. legal restrictions on aviation accident investigation data. Under federal law, the National Transportation Safety Board…
科技巨頭 CEO 拒絕出席，川普突取消 AI 安全測試行政命令簽署儀式並稱其「阻礙創新」★ 75
Ars Technica AI23 days agoBusiness
According to a report by Ars Technica, U.S. President Donald Trump abruptly canceled an official event that had been scheduled for the signing of an executive…
你現在無法在 Google 搜尋「disregard」這個單字了：AI 更新導致搜尋介面崩潰★ 75
TechCrunch AI23 days agoIncident
According to a TechCrunch report, following a recent AI feature update to Google Search, a baffling system bug emerged: users can now cause the entire Google…
川普延後簽署 AI 安全行政命令，稱原有條款可能成為發展阻礙★ 80
TechCrunch AI24 days agoBusiness
US President Donald Trump recently decided to delay signing a highly anticipated AI safety executive order. The core of the order was to establish a…
由 Tony Robbins 與 Calm 前團隊創立的 AI 心理諮商平台「The Path」主打更安全的 AI 治療
TechCrunch AI24 days agoRelease
As generative AI becomes widespread, discussions and experiments around applying AI to psychological counseling and mental health support have never stopped —…
Google 的 SynthID AI 水印技術獲 OpenAI、NVIDIA 等巨頭採用★ 85
Ars Technica AI26 days agoBusiness
As generative AI technology advances at a breakneck pace, AI-generated text, images, audio, and video have reached a point where they are nearly…
讓使用者更輕鬆了解網頁內容的建立與編輯來源：Google 擴大推廣內容憑證與 SynthID 技術★ 78
Google DeepMind Blog28 days agoRelease
As generative AI technology becomes more widespread, the internet is increasingly flooded with images and information that are difficult to distinguish as real…
Import AI 455：AI 系統即將開始自我構建——邁向遞迴自我提升的第一步★ 85
Import AI (Jack Clark)41 days agoCommentary
In the latest issue of Import AI 455, Jack Clark guides readers through an exploration of a highly forward-looking and both exciting and concerning theme: AI…
AI 與網路安全的未來：為什麼「開放」至關重要★ 75
Hugging Face Blog54 days agoOpinion
As artificial intelligence (AI) technology undergoes explosive growth, cybersecurity has become a focal point of concern for governments and enterprises…
Import AI 454：自動化對齊研究、中國 AI 模型安全評估與全新 4 位元浮點格式 HiFloat4★ 75
Import AI (Jack Clark)55 days agoCommentary
In this issue of Import AI 454, written by Jack Clark, the author begins by posing a thought-provoking question about finance and sociology: "At what point…
Import AI 453：破解 AI Agent、MirrorCode，以及關於「漸進式失權」的十種觀點★ 75
Import AI (Jack Clark)62 days agoCommentary
This issue of Import AI (Issue 453), written by Anthropic co-founder Jack Clark, centers on AI system safety, coding capabilities, and the future of humanity…
Claude 神話與對開源權重模型無謂的恐慌★ 75
Interconnects (Nathan L.)65 days agoOpinion
In this opinion piece published in Interconnects, prominent AI policy and technology critic Nathan Lambert delivers a sharp critique of the excessive panic…
Google DeepMind 發表最新研究：防範 AI 在金融與醫療領域的有害操縱風險★ 75
Google DeepMind Blog81 days agoRelease
Google DeepMind has recently published research findings on preventing harmful manipulation by AI. As large language models (LLMs) and AI Agents become…
ImportAI 449：LLM 訓練 LLM、72B 分散式訓練、為什麼電腦視覺比文本生成更難？以及 AI 是否會引發政治過渡期？★ 75
Import AI (Jack Clark)90 days agoCommentary
This issue of Import AI (No. 449) dives deep into several core frontier topics in the current AI landscape, spanning technical breakthroughs and broad…
Import AI 445：超級智能的時機點、AI 破解前沿數學證明與全新機器學習研究基準★ 75
Import AI (Jack Clark)118 days agoCommentary
In this edition of Import AI (Issue 445), author Jack Clark guides readers through three core topics at the very frontier of AI development: the timeline for…
Import AI 443：走入迷霧：Moltbook、Agent 生態系與轉型中的網際網路★ 75
Import AI (Jack Clark)132 days agoCommentary
In this edition of Import AI 443, author Jack Clark guides readers through a far-reaching trend already underway: the internet is transforming from…
Hugging Face 發表「開放回應（Open Responses）」：關於開源 AI 政策倡議你該知道的事★ 75
Hugging Face Blog150 days agoOpinion
Hugging Face, the world's largest open-source AI community platform, has published an article titled "Open Responses," aimed at explaining to developers and…
Import AI 440：紅皇后效應 AI、AI 監管 AI 與 O型環自動化理論★ 75
Import AI (Jack Clark)153 days agoOpinion
In the latest issue of Import AI 440, author Jack Clark delves into three key structural trends facing AI development today: the Red Queen Effect, the…
ServiceNow AI 推出 AprielGuard：提升現代 LLM 系統安全與對抗防禦能力的防護欄模型★ 75
Hugging Face Blog173 days agoRelease
As large language models (LLMs) are widely deployed across enterprises and various applications, ensuring the safety of their outputs and defending against…
Google DeepMind 推出 Gemma Scope 2：支援全新 Gemma 3 家族，深化 AI 安全與可解釋性研究★ 75
Google DeepMind Blog180 days agoRelease
Google DeepMind has officially released Gemma Scope 2, extending its powerful open-source model interpretability tools to the latest Gemma 3 model family. This…
Google DeepMind 深化與英國 AI 安全研究所（UK AISI）的合作關係★ 75
Google DeepMind Blog185 days agoBusiness
Google DeepMind has announced a deepened collaboration with the UK AI Security Institute (UK AISI), with both parties committing to joint work on critical AI…
Google DeepMind 加強與英國政府合作，共同促進 AI 時代的繁榮與安全
Google DeepMind Blog186 days agoBusiness
Google DeepMind has announced that it will deepen its partnership with the UK government — a collaboration aimed at jointly supporting national prosperity and…
Google DeepMind 將 AI 圖像驗證技術（SynthID）導入 Gemini 應用程式★ 80
Google DeepMind Blog206 days agoRelease
With the rapid advancement of generative AI technology, identifying the authenticity of images and defending against deepfakes has become an urgent priority…
Hugging Face 推出 Voice Consent Gate：為語音複製建立安全授權機制★ 75
Hugging Face Blog229 days agoNew Tool
With the rapid advancement of voice cloning technology, generating hyper-realistic synthetic voices has become remarkably easy. However, this has also…

← PreviousPage 2Next →

Latest in AI

AI leaders call for tougher protections against AI-aided bioweapons★ 76

Trump AI testing plan faces problem: DOGE gutted US security teams

Expanding Project Glasswing★ 76

LLMs believe false statements even after explicit warnings that they're false★ 74

Claude’s new model is more ‘honest’ when it messes up

Google AI 搜尋出現大漏洞！搜尋「disregard」竟讓 AI 忽視指令並吐出聊天機器人預設回覆

美國政府緊急應對：網友利用 AI 模擬罹難飛行員聲音，規避法律限制★ 75

科技巨頭 CEO 拒絕出席，川普突取消 AI 安全測試行政命令簽署儀式並稱其「阻礙創新」★ 75

你現在無法在 Google 搜尋「disregard」這個單字了：AI 更新導致搜尋介面崩潰★ 75

川普延後簽署 AI 安全行政命令，稱原有條款可能成為發展阻礙★ 80

由 Tony Robbins 與 Calm 前團隊創立的 AI 心理諮商平台「The Path」主打更安全的 AI 治療

Google 的 SynthID AI 水印技術獲 OpenAI、NVIDIA 等巨頭採用★ 85

讓使用者更輕鬆了解網頁內容的建立與編輯來源：Google 擴大推廣內容憑證與 SynthID 技術★ 78

Import AI 455：AI 系統即將開始自我構建——邁向遞迴自我提升的第一步★ 85

AI 與網路安全的未來：為什麼「開放」至關重要★ 75

Import AI 454：自動化對齊研究、中國 AI 模型安全評估與全新 4 位元浮點格式 HiFloat4★ 75

Import AI 453：破解 AI Agent、MirrorCode，以及關於「漸進式失權」的十種觀點★ 75

Claude 神話與對開源權重模型無謂的恐慌★ 75

Google DeepMind 發表最新研究：防範 AI 在金融與醫療領域的有害操縱風險★ 75

ImportAI 449：LLM 訓練 LLM、72B 分散式訓練、為什麼電腦視覺比文本生成更難？以及 AI 是否會引發政治過渡期？★ 75

Import AI 445：超級智能的時機點、AI 破解前沿數學證明與全新機器學習研究基準★ 75

Import AI 443：走入迷霧：Moltbook、Agent 生態系與轉型中的網際網路★ 75

Hugging Face 發表「開放回應（Open Responses）」：關於開源 AI 政策倡議你該知道的事★ 75

Import AI 440：紅皇后效應 AI、AI 監管 AI 與 O型環自動化理論★ 75

ServiceNow AI 推出 AprielGuard：提升現代 LLM 系統安全與對抗防禦能力的防護欄模型★ 75

Google DeepMind 推出 Gemma Scope 2：支援全新 Gemma 3 家族，深化 AI 安全與可解釋性研究★ 75

Google DeepMind 深化與英國 AI 安全研究所（UK AISI）的合作關係★ 75

Google DeepMind 加強與英國政府合作，共同促進 AI 時代的繁榮與安全

Google DeepMind 將 AI 圖像驗證技術（SynthID）導入 Gemini 應用程式★ 80

Hugging Face 推出 Voice Consent Gate：為語音複製建立安全授權機制★ 75