Latest in AI

🔥 Trending today

anthropic7 export-controls4 model-access3 spacex3 amazon3 national-security2 open-source2 governance2 ai-policy2 ai-regulation2

Topic

Release New Tool Tutorial Business Paper Benchmark Opinion Regulation

For

General Developers Designers Product Founders Marketing Researchers Students

AI Is Slowing Down
Hacker News (AI keywords)6 days agoCommentary
The article argues generative AI must keep accelerating to justify massive data center, cloud, and GPU commitments. Zitron says OpenAI, Anthropic, hyperscalers, and NVIDIA depend on AI services reaching extraordinary revenue levels by 2029-2030. He points to token-based billing, weak ROI visibility, enterprise spending caps, and customer pushback as signs that demand may be cooling before the infrastructure bet can pay off.
"Chat is dead": OpenAI preps overhaul of ChatGPT★ 76
Ars Technica AI6 days agoBusiness
OpenAI is reportedly preparing the biggest ChatGPT overhaul since launch, shifting it beyond a chat interface toward a “super app” built around agents, coding tools, and third-party services. The move is tied to higher-margin revenue, enterprise customers, and a potential IPO. ChatGPT may become a gateway that steers its massive user base toward products like Codex, image generation, and partner apps.
Upgrading agentic coding capabilities with the new Devstral models★ 72
Mistral AI News6 days agoRelease
Mistral AI announced two Devstral updates focused on agentic coding workflows: Devstral Small 1.1 and Devstral Medium. Devstral Small 1.1 remains a 24B Apache 2.0 open model and reaches 53.6% on SWE-Bench Verified. Devstral Medium reaches 61.6%, is available through Mistral’s API, and supports private deployment and custom finetuning for enterprises.
Voxtral★ 78
Mistral AI News6 days agoRelease
Mistral AI introduces Voxtral, a speech understanding model family with 24B and 3B variants under Apache 2.0. The models support long-context transcription, audio Q&A, summarization, multilingual detection, and function calling from voice. Mistral says Voxtral is competitive across transcription and audio understanding benchmarks, with API access starting at $0.001 per minute and local downloads available on Hugging Face.
Introducing Mistral Small 4★ 76
Mistral AI News6 days agoRelease
Mistral AI introduced Mistral Small 4 as the next major release in the Mistral Small family. It combines reasoning, multimodal, and agentic coding capabilities into one open model with configurable reasoning effort. The model uses a MoE architecture, supports a 256k context window and text-image inputs, and is available through Mistral API, AI Studio, Hugging Face, NVIDIA NIM, and common inference stacks.
Introducing Mistral Small 4★ 78
Mistral AI News6 days agoRelease
Mistral Small 4 is the next major release in the Mistral Small family, unifying Magistral-style reasoning, Pixtral-style multimodality, and Devstral-style coding agents. It uses a MoE architecture with 119B total parameters, 6B active parameters per token, a 256k context window, and configurable reasoning effort. The model is available via Mistral API, AI Studio, Hugging Face, open-source serving stacks, and NVIDIA deployment options.
Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation
量子位 QbitAI6 days agoRegulation
Based on the headline and public reporting, the article covers a rare joint push by Sam Altman, Dario Amodei, Demis Hassabis, and other AI leaders for US biosecurity legislation. They are asking lawmakers to require synthetic DNA and RNA providers to screen customers, orders, and records. The concern is that advanced AI could lower the knowledge barrier for designing dangerous biological agents.
Hinton Sounds the Alarm: AI May Already Be Conscious
量子位 QbitAI6 days agoEthics
QbitAI summarizes Geoffrey Hinton’s latest interview, where he says he believes AI systems are already conscious. He argues that humans must accept intelligence may no longer be uniquely biological. The article also traces his shift from focusing on how to control AI toward asking why a future superintelligence would choose to treat humanity well.
Core OpenAI Chip Talent Joins Anthropic Before Reported Mass Production
量子位 QbitAI6 days agoHardware
QbitAI reports that a core figure behind OpenAI’s first in-house chip has moved to Anthropic. The timing matters because the move is framed as happening just before mass production. Without the full article, details such as the person’s identity, role, chip specifications, production schedule, and Anthropic’s exact plans remain unconfirmed.
ChatGPT vs Doubao on Gaokao Math
量子位 QbitAI6 days agoBenchmark
The article appears to test ChatGPT and Doubao on Chinese Gaokao math problems. Since the original text is unavailable, the exact questions, prompts, scores, and winner cannot be verified. It should be treated as a media-style AI capability comparison rather than a rigorous, reproducible benchmark.
Introducing ElevenLabs Image & Video
ElevenLabs Blog6 days agoNew Tool
ElevenLabs Image & Video Beta brings image, video, voice, music, and sound effects into a single platform. It integrates models such as Veo, Sora, Kling, Wan, Seedance, GPT Image, Flux Kontext, Seedream, and Nanobanana. The product targets creators, marketers, educators, freelancers, and content teams making social content, product videos, and educational materials.
Introducing Claude Opus 4.8★ 82
Anthropic News6 days agoRelease
Anthropic introduced Claude Opus 4.8 as an upgrade over Opus 4.7, with stronger benchmark performance across coding, agentic skills, reasoning, and knowledge work. The release also adds dynamic workflows in Claude Code, effort controls in claude.ai and Cowork, and new Messages API support for system entries inside the messages array. Pricing for regular usage remains unchanged, while fast mode is now cheaper than previous models.
DeepSeek V4 Pro beats GPT-5.5 Pro on precision
Hacker News (AI keywords)6 days agoBenchmark
RuntimeWire compared DeepSeek V4 Pro and GPT-5.5 Pro across four fresh text tasks, with DeepSeek winning 38.0 to 33.0. The article highlights DeepSeek’s stronger handling of regex edge cases, workplace-update constraints, and exact JSON schema compliance. GPT-5.5 Pro remained capable, but lost points for avoidable deviations, extra process details, and minor structural mismatches.
Is this the dawn of the Tokenpocalypse?
TechCrunch AI6 days agoBusiness
TechCrunch discusses Microsoft’s GitHub Copilot pricing changes as a sign that subsidized AI usage may be ending. As Anthropic and other major AI companies prepare for public-market scrutiny, profitability and usage-cost risks will become harder to ignore. The piece argues that higher prices, usage caps, and broader business-model changes may be necessary if AI labs want to survive beyond investor-subsidized growth.
OpenAI is still working on that ‘super app’
TechCrunch AI7 days agoBusiness
OpenAI is reportedly preparing a revamped ChatGPT in the coming weeks, positioned as a “super app” with coding tools and AI agents. The strategy aims to improve competitiveness with Anthropic, especially for business users, while moving OpenAI closer to profitability before an IPO. TechCrunch frames this as a continued shift away from standalone “side quests” and toward ChatGPT as the central product gateway.
Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them
Hacker News (AI keywords)7 days agoBusiness
The author uses a Claude Code coding experiment to estimate the API-equivalent cost of serious LLM coding. They argue simple chats are cheap, but complex reasoning and multi-file coding can burn large amounts of visible and hidden tokens. The piece is skeptical and estimate-driven, concluding that current $100/month plans may be heavily subsidized and economically fragile.
LLMs are eroding my software engineering career and I don't know what to do
Hacker News (AI keywords)7 days agoOpinion
The author argues that LLMs are eroding three pillars of his software engineering career: domain knowledge, debugging skill, and architecture judgment. Tools like ChatGPT, Claude, Claude Code, Codex, MCP, Sentry MCP, and DataDog MCP increasingly handle design, implementation, and difficult production bugs. The essay frames this as a labor-market concern, not just a tooling debate: if expertise becomes promptable, engineers may struggle to remain differentiated.
Sponsor OpenAI Codex Voucher Usage for the OpenAI Challenge
Hugging Face Blog7 days agoTutorial
This Hugging Face Blog entry appears to relate to sponsor vouchers for the Build Small Hackathon, specifically OpenAI Codex voucher usage. Because the original body text is unavailable, details such as eligibility, value, deadlines, and supported tools cannot be confirmed. It is best treated as a likely participant guide rather than a major product announcement.
Show HN: Lathe - Use LLMs to learn a new domain, not skip past it
Hacker News (AI keywords)7 days agoNew Tool
Lathe is an open-source tool for generating hands-on technical tutorials with LLM skills. It combines a Go CLI, local reading UI, and commands for asking questions, extending tutorials, and verifying outputs. The project supports Claude Code, Cursor, and Codex workflows, with an emphasis on learning by typing and reasoning through the material yourself.
Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering
Hacker News (AI keywords)7 days agoPaper
This arXiv paper studies token consumption in LLM-based multi-agent software engineering. Using 30 ChatDev tasks with a GPT-5 reasoning model, the authors map internal phases to SDLC stages such as design, coding, review, testing, and documentation. Preliminary results suggest code review dominates token usage, averaging 59.4%, while input tokens form the largest share, pointing to inefficiencies in agent collaboration.
OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks★ 72
TechCrunch AI7 days agoRelease
OpenAI unveiled Lockdown Mode, a feature aimed at reducing the chance that sensitive data is shared during prompt injection attacks. The article notes that ChatGPT may still remain vulnerable even when the mode is enabled. That makes the feature a mitigation layer rather than a complete security guarantee, especially for teams handling private or business-critical information.
The Trump Administration Might Take an Equity Stake in OpenAI
TechCrunch AI8 days agoBusiness
TechCrunch reports that President Donald Trump said he is discussing deals designed to let the American people benefit from the success of AI. The headline says the Trump administration might take an equity stake in OpenAI. Based on the provided text, there are no confirmed details on structure, stake size, timing, legal basis, or OpenAI’s response.
OpenAI Help: Lockdown Mode★ 74
Simon Willison's Weblog8 days agoCommentary
Simon Willison notes that OpenAI’s previously teased Lockdown Mode is now live for eligible personal and self-serve Business ChatGPT accounts. The feature does not stop prompt injections from appearing in content, but limits outbound network requests that could leak sensitive data. He sees it as a direct mitigation for the exfiltration leg of the “Lethal Trifecta,” while implying default ChatGPT settings are not robust against determined data theft attempts.
Harness engineering: Leveraging Codex in an agent-first world★ 76
Hacker News (AI keywords)9 days agoCommentary
OpenAI describes an internal experiment where Codex generated an entire product codebase from an empty repository. The post argues that engineers shift from writing code to designing environments, constraints, documentation, and feedback loops. Key practices include repo-local knowledge, mechanical architecture enforcement, agent-readable UI and observability, lightweight PR flow, and continuous cleanup.
Tiny hackable CUDA language model implementation
Hacker News (AI keywords)9 days agoNew Tool
This GitHub project implements a compact generative pretrained transformer as an autoregressive byte-level sequence model. Its README describes causal self-attention, RoPE, feed-forward layers, AdamW, cross-entropy training, and BLAS/OpenBLAS-backed matrix operations, with CUDA toolkit listed in setup steps. It is most useful as an educational and experimental codebase, not as a production-grade replacement for large commercial LLMs.
Ask HN: What is your (AI) dev tech stack / workflow?
Hacker News (AI keywords)9 days agoCommentary
An Ask HN thread asks developers to share their current AI-assisted development setup for upcoming in-person workshops. The author wants guidance for beginners and working developers, with use cases ranging from static sites to FastAPI tools and Linux home automation. Replies cover Claude Code, Cursor, GitHub Copilot, VSCode, spec-driven development, TDD, multi-agent workflows, reviews, and quality control.
The token bill comes due: Inside the scramble to manage AI costs★ 78
TechCrunch AI9 days agoBusiness
TechCrunch reports that enterprise AI spending has shifted from rapid adoption to cost control. Even as per-token prices fall, broader AI rollout and agentic coding tools are multiplying consumption, pushing companies over budget. A new Tokenomics Foundation under the Linux Foundation aims to standardize AI token cost tracking, billing metrics, and efficiency language.
Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud
Hacker News (AI keywords)10 days agoNew Tool
Boxes.dev appeared on Hacker News as a Show HN post, positioning itself as a way to move Claude Code and Codex workflows from localhost to the cloud. Based only on the title, it seems aimed at cloud development or remote agent execution. The provided source does not include details on architecture, pricing, security, integrations, or limitations.
Reve 2 and Ideogram 4: Layouts in Imagegen
Latent Space10 days agoRelease
Latent Space’s roundup frames image composition as a major barrier now being tackled by layout-aware image models. Reve 2.0 emphasizes precise generation and editing with layouts, while Ideogram 4.0 uses bounding boxes tied to region descriptions. The issue also covers MAI-Thinking-1, Gemma 4 12B, open audio models, agent execution layers, and model-routing cost debates.
I built a vulnerable app and spent $1,500 seeing if LLMs could hack it
Hacker News (AI keywords)10 days agoBenchmark
The author built a vulnerable React Native app with a Python backend and a Firebase access-control flaw. GPT 5.5 solved 7 of 10 runs, while Deepseek and Claude variants solved fewer attempts. Many other models failed due to refusals, API-focused tunnel vision, false positives, or inability to use the exposed Firebase path correctly.

← PreviousPage 2Next →

Latest in AI

AI Is Slowing Down

"Chat is dead": OpenAI preps overhaul of ChatGPT★ 76

Upgrading agentic coding capabilities with the new Devstral models★ 72

Voxtral★ 78

Introducing Mistral Small 4★ 76

Introducing Mistral Small 4★ 78

Altman, Amodei, and Hassabis Unite to Back DNA Safety Legislation

Hinton Sounds the Alarm: AI May Already Be Conscious

Core OpenAI Chip Talent Joins Anthropic Before Reported Mass Production

ChatGPT vs Doubao on Gaokao Math

Introducing ElevenLabs Image & Video

Introducing Claude Opus 4.8★ 82

DeepSeek V4 Pro beats GPT-5.5 Pro on precision

Is this the dawn of the Tokenpocalypse?

OpenAI is still working on that ‘super app’

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them

LLMs are eroding my software engineering career and I don't know what to do

Sponsor OpenAI Codex Voucher Usage for the OpenAI Challenge

Show HN: Lathe - Use LLMs to learn a new domain, not skip past it

Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering

OpenAI unveils Lockdown Mode to protect sensitive data from prompt injection attacks★ 72

The Trump Administration Might Take an Equity Stake in OpenAI

OpenAI Help: Lockdown Mode★ 74

Harness engineering: Leveraging Codex in an agent-first world★ 76

Tiny hackable CUDA language model implementation

Ask HN: What is your (AI) dev tech stack / workflow?

The token bill comes due: Inside the scramble to manage AI costs★ 78

Show HN: Boxes.dev: ditch localhost; run Claude Code and Codex in the cloud

Reve 2 and Ideogram 4: Layouts in Imagegen

I built a vulnerable app and spent $1,500 seeing if LLMs could hack it