Reports
AI-generated structured vendor updates
Nvidia ENPIRE: AI Agents Autonomously Train Robots to Install GPUs at 99% Success
Nvidia's ENPIRE framework enables AI coding agents (Codex, Claude Code) to autonomously write, test, and refine robot training code, achieving 99% pass@8 on GPU insertion and other contact-rich tasks. The system uses Git for collaboration, but token consumption scales faster than fleet size, and simulation-to-reality transfer remains imperfect.
NVIDIA GB300 NVL72 Dominates AgentPerf: 20x More Agents per Megawatt Reshapes AI Inference
NVIDIA's GB300 NVL72 tops the first agentic-AI benchmark AgentPerf, running up to 20x more agents per megawatt than an H200 on DeepSeek V4 Pro. The test stresses multi-step tool-calling workloads, highlighting a new efficiency metric that will drive data center hardware decisions toward agent-optimized architectures.
NVIDIA RTX Remix 1.5: RTX IO Shrinks Game Sizes, AI Agents Reshape Modding
NVIDIA releases RTX Remix 1.5, featuring RTX IO compression that slashes Half-Life 2 RTX from 80GB to 50GB and reduces CPU overhead. The update also introduces AI agent integration via 'RTX Remix Skills,' allowing AI coding agents to automate complex modding tasks, lowering the barrier for non-programmers.
AMD MLPerf 6.0: MI350 GPUs Achieve 3.5x Leap with MXFP4, Debut Multi-Node Training
AMD submitted its most comprehensive MLPerf Training 6.0 results, including first multi-node training (FLUX.1 on 512 GPUs) and MXFP4 training recipe. MI355X delivers 3.5x generational leap over MI300X on Llama 2-70B, within 5% of NVIDIA B200. 10 ecosystem partners validated reproducibility.
Microsoft Agent 365: Control Plane Lock Replaces Model Lock, Building an Entra Empire for AI
Microsoft launches Agent 365 as a unified control plane for AI agents, integrating Entra, Defender, Purview, Intune, and cost management, alongside the Microsoft IQ semantic platform. While claiming model diversity and openness, this effectively locks enterprise AI assets into Microsoft's management toolchain, shifting control from model layer to infrastructure layer.
AMD Acquires MEXT: AI-Predicted Flash Nears DRAM Performance to Cut AI Memory TCO
AMD acquires MEXT, an AI-driven memory optimization startup. MEXT's predictive technology makes NAND Flash behave like DRAM, expanding effective memory capacity for AI workloads and lowering TCO. The tech will be integrated across AMD's data center portfolio (EPYC, Instinct) to address memory bottlenecks in large models.
AMD Open-Sources AI Software Stack on Vultr, Taking on NVIDIA CUDA Ecosystem
AMD launches a suite of open-source, modular enterprise AI software components on Vultr Marketplace, including AMD Inference Microservices (AIMs), AI Workbench, Resource Manager, and Solution Blueprints. This aims to provide production-grade AI infrastructure without vendor lock-in, directly challenging NVIDIA's CUDA ecosystem.
NVIDIA Bets on World-Action Models: Control Shifts from VLM to Video Backbones
NVIDIA's blog introduces World-Action Models (WAMs) as a paradigm shift from VLM-based VLAs. WAMs leverage pretrained video/world-model backbones to jointly predict future states and robot actions, aiming to bridge the language-to-action grounding gap. This could redefine robot foundation model training but raises concerns about inference cost and latency.
Z.ai GLM-5.2 Ships Usable 1M-Token Context, No Benchmarks, Two Thinking Levels
Z.ai releases GLM-5.2 with a claim of usable 1M-token context and two thinking-effort levels. No standard benchmarks are provided, raising concerns about real-world performance. The model targets replacing chunking-based RAG with native long-context reasoning.
NVIDIA GB300 NVL72 Delivers 20x Agentic Coding Efficiency, Setting New Inference Benchmark
NVIDIA's GB300 NVL72 achieves 20x more concurrent coding agents per megawatt than H200 on the new AA-AgentPerf benchmark, leveraging 72-GPU NVLink fabric, MXFP4 kernels, and MoE optimizations. This first standardized agentic inference benchmark redefines data center capacity planning for AI agents.
NVIDIA AgentPerf Benchmark: Blackwell Ultra Delivers 20x More Agents per Megawatt vs Hopper
NVIDIA and Artificial Analysis unveil AgentPerf, the first benchmark for agentic AI workloads. Results show the GB300 NVL72 platform delivers up to 20x more concurrent agents per megawatt than the HGX H200 when running DeepSeek V4 Pro, using real coding agent trajectories to measure throughput and responsiveness.
Anthropic Locks Regulated Industries via DXC: Claude-Certified Engineers and OASIS Platform as New Control Points
Anthropic forms a global alliance with DXC Technology, training tens of thousands of Claude-certified forward-deployed engineers to embed Claude into mission-critical systems for banks, airlines, and regulated industries. DXC's OASIS platform defaults to Claude, with over 95% of its code generated by Claude, creating deep dependency.
AMD, Dell, Cambridge Launch UK Sovereign AI Lab to Challenge NVIDIA's CUDA Dominance with Open ROCm
AMD, Dell, and the University of Cambridge launch the Sovereign AI Innovation Lab (SAIL) in the UK, deploying Zenith supercomputer with 5th Gen EPYC and Instinct MI355X GPUs, plus the Sunrise fusion AI system. The lab promotes open, interoperable AI infrastructure based on AMD ROCm, challenging NVIDIA's CUDA lock-in and offering long-term technology choice for national AI initiatives.
Graviton5 + Nitro Formal Verification: AWS Locks AI CPU Control with ARM and Math
AWS launches Graviton5-based M9g/M9gd instances with 25% compute gain, PCIe Gen6, DDR5-8800, and the first formally verified cloud hypervisor (Nitro Isolation Engine). Meta deploys tens of millions of cores for agentic AI, marking a decisive ARM victory in cloud CPU.
Anthropic Claude Fable 5 on AWS: Data Retention Policy Breaches Cloud Security Boundary, Erodes Enterprise Data Sovereignty
AWS and Anthropic launch Claude Fable 5 with long-running async execution, advanced vision, and proactive self-verification. Access requires 30-day data retention and sharing with Anthropic, moving inference data outside AWS security boundary. Harmful prompts fall back to Opus 4.8, introducing complex pricing and governance risks.
GKE Inference Gateway Prefix Caching: 92% Faster AI Inference with Hidden Lock-in
Google Cloud launches GKE Inference Gateway with prefix caching and model-aware routing, achieving 92.8% lower TTFT and 15.7% higher throughput on Llama 3.1 8B. Snap reports 75-80% cache hit rates. However, deep integration with GKE Gateway API risks lock-in, limiting multi-cloud portability.
Cloudflare as Customer Zero: Layered Defense Architecture Against Frontier AI Threats
Cloudflare reveals its production defense architecture against frontier AI models, using itself as customer zero. Combines WAF Attack Score, API Shield, Bot Management, Zero Trust, and MCP Server Portal. Core insight: architecture around the vulnerability matters more than patch speed, using ML scoring and positive security models to block attack variants before they hit, and contain lateral movement after a breach.
NVIDIA's UK Sovereign AI Play: From Chip Vendor to National Infrastructure Controller
NVIDIA partners with the UK government to deploy sovereign AI infrastructure via Isambard-AI (5,400 GH200 superchips) and the Sovereign AI Fund, backing local startups. This move establishes a national AI control plane, locking compute into NVIDIA's ecosystem and bypassing traditional hyperscalers like AWS and Azure.
AWS Bedrock New Console Embraces OpenAI/Anthropic APIs, Shifting Control to Inference Layer
AWS launches a new Bedrock console powered by the bedrock-mantle endpoint, natively supporting OpenAI and Anthropic API protocols. Users can seamlessly switch between GPT, Claude, and open-weight models. This move standardizes model access, aiming to lock users into AWS's unified inference plane while weakening individual model provider API lock-in.
NVIDIA Nemotron 3 Ultra: A MoE-Based Control Plane for Cost-Efficient AI Agent Orchestration
NVIDIA launches Nemotron 3 Ultra, a 550B-parameter MoE model (55B active) purpose-built for AI agent orchestration. Featuring Multi-Teacher On-Policy Distillation (MOPD) and a Hybrid Mamba-Transformer architecture, it achieves 5x throughput and 30% cost savings on tasks like SWE-bench, signaling a shift of reasoning control to a layered agent system.