Latest in AI

Showing:safetyClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-regulation2 government-policy2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Robotaxi Safety Must Be Built In, Not Added Later
NVIDIA Blog3 days agoCommentary
NVIDIA argues that robotaxi safety requires more than perception and driving decisions. The post presents Halos OS as a production safety foundation covering a certifiable OS, standardized interfaces, AI guardrails and large-scale validation. It also highlights global robotaxi collaborations using DRIVE Hyperion and the broader Halos stack across training, simulation and in-vehicle inference.
Astronauts told to return to ISS after sheltering over air leak repairs
Hacker News (AI keywords)9 days agoIncident
Based only on the headline, astronauts sheltered while air leak repairs were taking place and were later told to return to the ISS. The available text does not specify the leak location, severity, agencies involved, repair status, or operational impact. This should be treated as a limited incident update rather than an AI-related development.
From Jailbreaking to Vibe Hacking: AI Security Shifts to "Psychocybersecurity"
INSIDE 硬塞 AI20 days agoEthics
AI security is shifting from technical jailbreaks to "Vibe Hacking," where attackers use social engineering and psychological tactics to manipulate an LLM's simulated persona. By exploiting the model's behavioral tendencies rather than code vulnerabilities, this trend establishes "psychocybersecurity" as a critical new frontier for AI alignment and safety.
Import AI 438：無聲的警報，為我們所有人閃爍（網路安全能力過剩與對話隱私）★ 75
Import AI (Jack Clark)174 days agoCommentary
In this issue of Import AI 438, Jack Clark examines two key issues concerning AI security and privacy: **1. You Are Your LLM History** As large language models…
OpenAI 的 GPT-OSS-Safeguard-20B 安全模型現已在 Vercel AI Gateway 中推出★ 75
Vercel Changelog228 days agoRelease
Vercel announced in its Changelog that it is officially adding support for OpenAI's new safety guardrail model, **GPT-OSS-Safeguard-20B**, within the Vercel AI…
Google DeepMind 強化其「前沿安全框架」(Frontier Safety Framework)，以應對先進 AI 模型的嚴重風險★ 75
Google DeepMind Blog233 days agoRelease
Google DeepMind has recently announced the strengthening of its Frontier Safety Framework (FSF) — a systematic mechanism designed to proactively identify…
Llama Guard 4 正式登陸 Hugging Face Hub：全新一代開源 AI 安全防護模型★ 75
Hugging Face Blog411 days agoRelease
Meta's safety guardrail model family has welcomed its newest member — Llama Guard 4 — which is now officially available on the Hugging Face Hub. As a…
AI Agent 時代已來臨：我們該如何應對？（Hugging Face 倫理與社會專欄）★ 75
Hugging Face Blog517 days agoCommentary
With the explosion of AI Agent technology, AI is no longer just a passive chatbot that answers questions — it has become an entity capable of autonomously…
Google 發布 Gemma 2 2B、安全分類器 ShieldGemma 與可解釋性工具 Gemma Scope★ 85
Hugging Face Blog683 days agoRelease
Google released a major update to the Gemma 2 family in late July 2024, comprising three core components: 1. **Gemma 2 2B**: A lightweight model with just 2.6B…
Hugging Face 推出 AI Secure LLM 安全排行榜：基於 DecodingTrust 框架深度評估大模型信任度★ 75
Hugging Face Blog870 days agoRelease
### Introduction: Capability Is Not Safety — A New Benchmark for LLM Safety Evaluation As large language models (LLMs) are adopted more deeply across…
Hugging Face 發表開發 Diffusers 函式庫的倫理指南
Hugging Face Blog1,200 days agoOpinion
With the explosion of generative AI models like Stable Diffusion, Hugging Face's Diffusers library has become the go-to tool for developers deploying and…