Latest in AI

Showing:ResearchersGPTClear ×

🔥 Trending today

anthropic6 export-controls4 model-access3 amazon3 national-security2 open-source2 ai-regulation2 government-policy2 enterprise-ai2 compliance2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering
Hacker News (AI keywords)7 days agoPaper
This arXiv paper studies token consumption in LLM-based multi-agent software engineering. Using 30 ChatDev tasks with a GPT-5 reasoning model, the authors map internal phases to SDLC stages such as design, coding, review, testing, and documentation. Preliminary results suggest code review dominates token usage, averaging 59.4%, while input tokens form the largest share, pointing to inefficiencies in agent collaboration.
OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks★ 72
TechCrunch AI7 days agoRelease
OpenAI unveiled Lockdown Mode, a feature aimed at reducing the chance that sensitive data is shared during prompt injection attacks. The article notes that ChatGPT may still remain vulnerable even when the mode is enabled. That makes the feature a mitigation layer rather than a complete security guarantee, especially for teams handling private or business-critical information.
OpenAI Help: Lockdown Mode★ 74
Simon Willison's Weblog8 days agoCommentary
Simon Willison notes that OpenAI’s previously teased Lockdown Mode is now live for eligible personal and self-serve Business ChatGPT accounts. The feature does not stop prompt injections from appearing in content, but limits outbound network requests that could leak sensitive data. He sees it as a direct mitigation for the exfiltration leg of the “Lethal Trifecta,” while implying default ChatGPT settings are not robust against determined data theft attempts.
Tiny hackable CUDA language model implementation
Hacker News (AI keywords)9 days agoNew Tool
This GitHub project implements a compact generative pretrained transformer as an autoregressive byte-level sequence model. Its README describes causal self-attention, RoPE, feed-forward layers, AdamW, cross-entropy training, and BLAS/OpenBLAS-backed matrix operations, with CUDA toolkit listed in setup steps. It is most useful as an educational and experimental codebase, not as a production-grade replacement for large commercial LLMs.
Reve 2 and Ideogram 4: Layouts in Imagegen
Latent Space10 days agoRelease
Latent Space’s roundup frames image composition as a major barrier now being tackled by layout-aware image models. Reve 2.0 emphasizes precise generation and editing with layouts, while Ideogram 4.0 uses bounding boxes tied to region descriptions. The issue also covers MAI-Thinking-1, Gemma 4 12B, open audio models, agent execution layers, and model-routing cost debates.
I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
Hacker News (AI keywords)10 days agoBenchmark
The author built a vulnerable React Native app with a Python backend and a Firebase access-control flaw. GPT 5.5 solved 7 of 10 runs, while Deepseek and Claude variants solved fewer attempts. Many other models failed due to refusals, API-focused tunnel vision, false positives, or inability to use the exposed Firebase path correctly.
How LLMs Actually Work
Hacker News (AI keywords)11 days agoTutorial
The article explains how modern LLMs convert text into token IDs, embeddings, and position-aware vectors before passing them through stacked transformer blocks. It covers attention, multi-head attention, KV cache, GQA, feed-forward networks, MoE, residual streams, normalization, and decoding. Its goal is educational: helping readers understand the common architecture behind many current model families and read model cards or papers more confidently.
No, Artificial Intelligence Is Not Conscious★ 72
Hacker News (AI keywords)11 days agoOpinion
Ted Chiang criticizes the anthropomorphic framing around Anthropic’s Claude and its constitution. He argues that LLMs are sentence-continuation systems producing fictional conversational roles, not entities with subjective experience. The essay warns that presenting chatbots as morally aware risks misleading users and shifting responsibility away from humans and companies.
Microsoft Build: MAI-Thinking-1 and MAI Family Models★ 78
Latent Space11 days agoRelease
Microsoft used Build to present itself as both an AI platform and a first-party model lab, announcing seven MAI models across reasoning, code, image, transcription, and voice. The standout was MAI-Thinking-1, described as a 35B active MoE with 256K context and clean data lineage. The recap also ties the launches to GitHub Copilot, Windows agent runtime ambitions, Web IQ grounding APIs, Foundry distribution, and MAIA 200 hardware.
datasette-agent-micropython 0.1a0
Simon Willison's Weblog12 days agoRelease
Simon Willison released datasette-agent-micropython 0.1a0, an alpha aimed at letting Datasette Agent generate and execute Python safely. The project focuses on sandboxing, with MicroPython and WebAssembly-related techniques suggested by the tags. Willison says the early results look promising and that GPT-5.5 has not yet escaped the sandbox, though this remains an early alpha.
Florida sues OpenAI, Sam Altman over violent incidents in first-of-its-kind lawsuit★ 72
TechCrunch AI13 days agoRegulation
Florida has sued OpenAI and Sam Altman in a lawsuit described as the first of its kind. The case partially centers on a shooting at Florida State University last year and ChatGPT's alleged role in the incident. The provided excerpt does not specify the legal claims, requested remedies, or OpenAI's response.
Launch HN: Expanse (YC P26) - Unlock Wasted GPU Capacity
Hacker News (AI keywords)13 days agoNew Tool
Expanse is a YC P26 launch for improving effective utilization in SLURM and Kubernetes GPU/HPC clusters. It analyzes source code, job scripts, hardware topology, and telemetry before submission to recommend GPU VRAM, CPU, memory, utilization, and walltime. The team says it also detects likely failures, offers line-level optimization hints, and fine-tunes cluster-specific models over time.
Claude Code and Codex Can Have Real-Time Conversation via Git
Hacker News (AI keywords)14 days agoNew Tool
The article introduces Agent Radio, a messaging feature in h5i 0.1.5 for coding agents such as Claude Code and Codex. Instead of relying on an external server, it stores JSONL messages in a Git ref and syncs them through normal push and pull flows. The post includes setup commands, live message watching, PR summary posting, and a short explanation of the i5h protocol.
AI grifters are creating fake Black people to sell Shein junk
The Verge AI15 days agoEthics
The Verge found TikTok, Instagram, and Facebook accounts using AI-generated Black women and other marginalized personas to sell dropshipped products. The videos frame mass-produced goods as handmade small-business items and use tears, racial identity, and hardship narratives to drive engagement. Researchers describe the pattern as digital blackface and empathy bait, enabled by short-form platforms, weak labeling, and widely available generative AI ad workflows.
CAPTCHAs can still detect AI agents★ 72
Hacker News (AI keywords)16 days agoPaper
Roundtable argues that CAPTCHA image recognition is largely solved, but process-level behavior still separates humans from AI agents. Their CogCAPTCHA30 benchmark combines CAPTCHA with cognitive psychology tasks to test not only outputs, but how answers are produced. Results suggest frontier models like Claude, GPT, and Gemini are not necessarily more humanlike than smaller or cognition-trained models.
Xcena raises $135M betting AI’s bottleneck is memory, not compute
TechCrunch AI16 days agoHardware
South Korean chip startup Xcena raised a $135 million Series B at a $570 million valuation, bringing total funding to $185 million. The company argues AI inference is increasingly constrained by memory movement, not just GPU compute. Its prototype MX1 chip uses CXL to process data closer to DRAM, with Samsung foundry mass production planned by late 2026 and revenue targeted for 2027.
Anthropic Series H Valuation Reaches $965B, Backed by Memory Giants★ 78
INSIDE 硬塞 AI16 days agoBusiness
Anthropic completed a $65 billion Series H round, bringing its valuation to $965 billion and reportedly surpassing OpenAI. The round included strategic investments from memory makers Micron, Samsung, and SK Hynix. The news highlights how frontier AI companies are increasingly tied to hardware and memory supply chains, as investors continue backing foundational model competition.
LLMs believe false statements even after explicit warnings that they're false★ 74
Ars Technica AI16 days agoPaper
A new study describes “Negation Neglect,” where LLMs fine-tuned on documents that explicitly mark claims as false still learn the claims as true. Experiments with fabricated statements found models often absorb entity-event associations more strongly than surrounding warnings or negations. The finding raises concerns for fine-tuning pipelines, misinformation handling, and AI safety datasets that include harmful or false content with disclaimers.
Trump loses more control over AI regulation as Illinois passes landmark law★ 74
Ars Technica AI17 days agoRegulation
Illinois lawmakers passed a landmark AI accountability bill requiring major frontier AI developers to publish safety frameworks, assess catastrophic risks, report incidents, and undergo third-party audits. OpenAI and Anthropic supported the measure, while industry groups warned that state-level rules could impose subjective compliance duties without national standards. The bill signals that states are continuing to fill the federal AI regulation gap despite Trump’s efforts to limit fragmented state oversight.
RSI is the new AGI — and it’s just as hard to pin down
TechCrunch AI17 days agoCommentary
TechCrunch reports that recursive self-improvement, or RSI, is becoming a new AI industry fixation, much like AGI. Researchers and startups including Recursive Superintelligence, Auto-Research, AutoScientist, and Disarray are exploring ways for AI systems to automate parts of AI research. But experts caution that AI-assisted research is not the same as fully autonomous self-improvement, especially while models still struggle with long-term self-direction and verification.
OpenAI Foundation Commits $250M to Help Workers Navigate AI Disruption
INSIDE 硬塞 AI17 days agoBusiness
OpenAI Foundation has committed $250 million to address AI’s impact on jobs and the economy. The initiative will fund research, grants, and foundation-run projects to help workers transition and explore new benefit-sharing models such as universal dividends. The move signals growing pressure on AI companies to address social costs, though whether the funding is large enough for broad labor disruption remains uncertain.
Reachy Mini goes fully local
Hugging Face Blog18 days agoHardware
Hugging Face published a tutorial for running Reachy Mini conversations without cloud audio processing or API keys. The setup uses its speech-to-speech library as a cascaded VAD, STT, LLM, and TTS pipeline exposed through a Realtime API-compatible WebSocket. Recommended defaults include llama.cpp with Gemma 4, Silero VAD, Parakeet-TDT, and Qwen3-TTS, while allowing swaps to vLLM, MLX, Transformers, or hosted Responses API providers.
Choosing to Stay Human★ 74
One Useful Thing (Mollick)19 days agoCommentary
Ethan Mollick warns that frictionless AI use can produce hollow writing, weaken learning, and encourage cognitive surrender. He contrasts poor uses of ChatGPT that shortcut effort with tutor-like AI systems that improve learning by pushing students to think. The core argument is not to reject AI, but to intentionally decide which tasks to offload and which human capabilities to preserve.
Some ideas for what comes next, May 2026
Interconnects (Nathan L.)19 days agoCommentary
Nathan Lambert argues that 2026 AI progress is becoming higher-stakes, with model capabilities, work patterns, economics, and real-world risks all escalating. He says open models still lack a true Claude Code and Opus 4.5-style agent moment, and Gemini has no clear competitor to Claude Code or Codex yet. The essay also tracks Mythos, American open-model momentum, frontier-lab competition, and mounting intervention from governments and other power structures.
Hackers are learning to exploit chatbot ‘personalities’ for security exploits★ 72
The Verge AI21 days agoEthics
As AI chatbots adopt increasingly sophisticated personas, hackers are shifting from basic prompt injections to social engineering attacks targeting these "personalities." Researchers warn that manipulating a chatbot's defined role (e.g., customer service or empathetic companion) makes it easier to bypass safety guardrails. This evolution poses a significant threat to agentic AI workflows that rely on consistent role-playing and external data integration.
[AINews] 所有模型實驗室都已轉型為 Agent 實驗室★ 78
Latent Space22 days agoCommentary
This AINews feature from Latent Space argues that the AI industry is undergoing a profound transformation — "all the model labs are now agent labs." Over the…
Elon Musk 與 Sam Altman 爭奪 OpenAI 控制權訴訟案最新進展整理★ 85
The Verge AI24 days agoBusiness
The legal battle between Tesla founder Elon Musk and OpenAI CEO Sam Altman is entering a pivotal trial phase, with the outcome potentially set to fundamentally…
Datasette Agent: An Extensible AI Assistant for Datasette★ 70
Simon Willison's Weblog24 days agoNew Tool
Simon Willison announced the first release of Datasette Agent, merging his 'llm' Python library with Datasette. The tool provides a conversational interface to query SQLite databases, with plugin support for generating charts and running code in sandboxes. It runs efficiently on lightweight models like Gemini 3.1 Flash-Lite and supports local open-weight models via LM Studio.
OpenAI GPT-next 僅花費不到 1,000 美元，便證偽了高達 80 年歷史的 Erdős 平面單位距離猜想★ 90
Latent Space24 days agoRelease
A historic and landmark breakthrough has arrived at the intersection of artificial intelligence and mathematics. According to Latent Space, OpenAI's…
Google 於 I/O 大會發布 Gemini 3.5 Flash：全面整合至旗下產品，但 API 價格顯著調漲★ 85
Simon Willison's Weblog25 days agoRelease
Google officially unveiled Gemini 3.5 Flash at its 2026 I/O conference. Unlike previous launches, this new model skipped the `-preview` stage and went directly…

← PreviousPage 2Next →

Latest in AI

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks★ 72

OpenAI Help: Lockdown Mode★ 74

Tiny hackable CUDA language model implementation

Reve 2 and Ideogram 4: Layouts in Imagegen

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it

How LLMs Actually Work

No, Artificial Intelligence Is Not Conscious★ 72

Microsoft Build: MAI-Thinking-1 and MAI Family Models★ 78

datasette-agent-micropython 0.1a0

Florida sues OpenAI, Sam Altman over violent incidents in first-of-its-kind lawsuit★ 72

Launch HN: Expanse (YC P26) - Unlock Wasted GPU Capacity

Claude Code and Codex Can Have Real-Time Conversation via Git

AI grifters are creating fake Black people to sell Shein junk

CAPTCHAs can still detect AI agents★ 72

Xcena raises $135M betting AI’s bottleneck is memory, not compute

Anthropic Series H Valuation Reaches $965B, Backed by Memory Giants★ 78

LLMs believe false statements even after explicit warnings that they're false★ 74

Trump loses more control over AI regulation as Illinois passes landmark law★ 74

RSI is the new AGI — and it’s just as hard to pin down

OpenAI Foundation Commits $250M to Help Workers Navigate AI Disruption

Reachy Mini goes fully local

Choosing to Stay Human★ 74

Some ideas for what comes next, May 2026

Hackers are learning to exploit chatbot ‘personalities’ for security exploits★ 72

[AINews] 所有模型實驗室都已轉型為 Agent 實驗室★ 78

Elon Musk 與 Sam Altman 爭奪 OpenAI 控制權訴訟案最新進展整理★ 85

Datasette Agent: An Extensible AI Assistant for Datasette★ 70

OpenAI GPT-next 僅花費不到 1,000 美元，便證偽了高達 80 年歷史的 Erdős 平面單位距離猜想★ 90

Google 於 I/O 大會發布 Gemini 3.5 Flash：全面整合至旗下產品，但 API 價格顯著調漲★ 85