Latest in AI

Showing:hallucinationResearchersClear ×

🔥 Trending today

anthropic7 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2 ai-regulation2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

KPMG Pulls AI Usage Report Due to Apparent Hallucinations
TechCrunch AI10 hours agoIncident
KPMG, one of the world's largest professional services firms, withdrew a published report on AI usage after it was found to contain apparent hallucinations — errors likely introduced by an AI system used in its preparation. The incident highlights a sharp irony: AI proving unreliable as a source of information about AI itself. It adds to a growing list of high-profile cases where AI-generated content has undermined the credibility of professional and institutional outputs.
Judge Learns Both Sides Used AI, Cancels Trial, Kicks Everyone Off the Case
Hacker News (AI keywords)5 days agoIncident
In a rare legal incident, a judge found that attorneys on both sides of a case had used AI tools in their legal work. The judge responded by canceling the trial entirely and dismissing all lawyers involved. The case highlights growing judicial frustration with unchecked AI use in court filings and the serious professional consequences that can follow.
"Fully Hallucinated Operating System" Simulates an Entire OS via LLM Prompts
r/LocalLLaMA top day6 days agoCommentary
A popular Reddit post highlights a video demonstrating a "Fully Hallucinated Operating System" run entirely inside an LLM. By prompting the model to act as a terminal, it simulates file systems, network requests, and command execution purely through text generation. While impractical for production, this experiment showcases the impressive state-tracking and "world model" capabilities of modern LLMs.
Claude’s new model is more ‘honest’ when it messes up
The Verge AI17 days agoRelease
Anthropic is releasing Claude Opus 4.8 and highlighting the model’s “honesty” as a key improvement. The company says it trains its models to avoid unsupported claims, addressing a broader issue where AI systems sometimes jump to conclusions. Based on the provided excerpt, the update is positioned around reliability and uncertainty handling rather than a specific new tool or benchmark result.
AI 在書中編造「虛擬引言」，但這位作家仍堅持繼續使用 AI 輔助創作
Ars Technica AI23 days agoOpinion
In an era of rapidly growing AI-assisted writing, the collaboration between writers and AI is undergoing unprecedented tests. Author and documentary filmmaker…
arXiv 祭出新政策：提交 AI 生成的垃圾論文或幻覺內容，將面臨禁投一年的處罰★ 75
Ars Technica AI30 days agoIncident
The well-known academic preprint platform arXiv has recently introduced strict new rules regarding AI-generated content. According to the latest policy…
Google DeepMind 推出 FACTS 基準測試套件：系統化評估大型語言模型的真實性★ 80
Google DeepMind Blog187 days agoRelease
As large language models (LLMs) are deployed across a wide range of industries, ensuring the "factuality" of model outputs and reducing "hallucination" has…
Hugging Face 推出「企業情境排行榜」：專為真實世界應用設計的 LLM 評測基準★ 75
Hugging Face Blog865 days agoRelease
Hugging Face has partnered with Patronus AI — a startup focused on LLM evaluation and defense — to officially launch the **Enterprise Scenarios Leaderboard**…
Hugging Face 推出「幻覺排行榜」，開源量化評估大型語言模型的幻覺率★ 75
Hugging Face Blog867 days agoRelease
While large language models (LLMs) have demonstrated remarkable generative capabilities across many domains, "hallucination" — where a model confidently…
如何建立自己的 Hugging Face 排行榜：以 Vectara 幻覺排行榜為例的完整指南★ 75
Hugging Face Blog884 days agoTutorial
In the open-source AI community, the Hugging Face Open LLM Leaderboard serves as an important benchmark for evaluating model capabilities. However, many…