inference - AI Infrastructure Intelligence Search

Samsung Electronics Other 2026-06-02

HBM Profitability Falls Below DDR5, TrendForce Warns of Multi-Fold Price Surge in 2027

TrendForce reports that HBM per-wafer revenue fell below DDR5 64GB RDIMM in Q1 2026, making HBM less profitable. Suppliers will reallocate capacity, leading to multi-fold HBM4 contract price increases in 2027. Demand from NVIDIA Rubin Ultra and AI ASICs will further tighten supply.

ARM Other 2026-06-02

Arm-NVIDIA RTX Spark: Tightly Coupled CPU-GPU for Agentic AI PCs

The Arm-based NVIDIA RTX Spark integrates Arm Grace CPU with NVIDIA Blackwell RTX GPU via unified memory, enabling ultra-low latency on-device AI inference for the agentic era. This platform marks a major milestone for Windows on Arm, targeting developers, creators, and gamers.

ARM Other 2026-06-02

Arm and NVIDIA RTX Spark: Unified Memory PC Architecture Targets Agentic AI, Encircles x86

Arm and NVIDIA unveil RTX Spark, an Arm-based Grace CPU + Blackwell RTX GPU platform with unified memory, targeting Windows on Arm for agentic AI inference. It delivers 1 Petaflop, reduces token cost, and signals a PC paradigm shift from app-driven to agent-driven, backed by Microsoft.

NVIDIA Other 2026-06-02

NVIDIA DGX Spark Update: One-Click Local AI Agents, Multi-Node Cluster for 400B Models

At Computex 2026, NVIDIA updates DGX Spark with NemoClaw for one-click local AI agent setup, 2.6x throughput boost for Qwen3.6-35B via vLLM optimizations, and Sync cluster assistant to connect 2-4 nodes over ConnectX-7 200Gbps RoCE, enabling local deployment of large models and multi-agent pipelines.

Amazon Other 2026-06-02

AWS Hosts OpenAI GPT-5.5 & Codex: Control Shifts from Model to Cloud

AWS launches OpenAI GPT-5.5, GPT-5.4, and Codex on Bedrock via the Responses API. This integrates frontier models into AWS infrastructure for data residency and capacity management, but locks users into Bedrock's ecosystem.

NVIDIA Other 2026-06-01

NVIDIA FOX Blueprint Shifts Factory Control from PLCs to AI Agents on DGX

NVIDIA unveiled the Factory Operations Blueprint (FOX), a reference design for autonomous factory manager agents using NemoClaw, AI-Q Blueprint, and DGX Station (GB300 with 20 PFLOPS FP4, 748GB coherent memory). It unifies live machine signals, quality systems, and robot fleets under an AI decision layer. Foxconn, Pegatron, Advantech, and Wistron are early adopters, projecting 80% faster root cause analysis and 15% labor productivity gains.

NVIDIA Other 2026-06-01

NVIDIA Alpamayo: Closed-Loop RL Post-Training Bridges AV Sim-to-Real Gap

NVIDIA's Alpamayo platform introduces AlpaGym, an open-source, high-throughput closed-loop RL post-training framework. It integrates AlpaSim simulator, Cosmos-RL distributed training, and Physical AI datasets, enabling AV models to learn from the consequences of their own actions in simulation, significantly reducing the gap between training and deployment.

NVIDIA Other 2026-06-01

NVIDIA Cosmos 3: Open-Source Physical AI Model with MoT for Ecosystem Lock-in

NVIDIA releases Cosmos 3, a unified physical AI foundation model with Mixture-of-Transformers architecture combining reasoning, world generation, and action generation. Open-sourced with training scripts and six synthetic datasets, but deployment optimized for NVIDIA NIM and GPUs, signaling an ecosystem lock-in strategy.

NVIDIA Other 2026-06-01

NVIDIA BlueField DPU In-Silicon Security Shifts AI Factory Control from Software to Hardware

NVIDIA unveils DOCA security stack (Argus, Vault, Flow) on BlueField-4 DPU, enabling hardware-isolated runtime threat detection via zero-copy memory analysis, zero-trust file access, and 800 Gb/s network enforcement. This shifts security control from host OS to DPU silicon, delivering distributed full-stack protection without compromising AI throughput, but deeply ties to Vera Rubin platform, creating ecosystem lock-in.

NVIDIA Other 2026-06-01

NVIDIA Vera CPU: Custom Olympus Core and LPDDR5X Redefine CPU for Agentic AI Factories

NVIDIA unveils Vera CPU with 88 custom Olympus cores, 1.2TB/s LPDDR5X bandwidth, and SCF fabric, targeting CPU execution bottlenecks in agentic AI and reinforcement learning. Claiming 1.8x performance over x86 and memory power under 30W, it shifts AI factory metrics from cores-per-dollar to tokens-per-dollar.

NVIDIA Other 2026-06-01

NVIDIA DSX OS: Open Source Software to Seize AI Factory Control Plane

NVIDIA launches DSX OS, an open-source modular software suite for operating AI factories. Components include DSX Exchange, MaxLPS, NICo, NVSentinel, etc., unifying IT/OT, power optimization, and lifecycle management. Claims 40% more GPUs under fixed power, but core relies on NVIDIA proprietary hardware, aiming to lock users into its ecosystem.

Intel Other 2026-06-01

Intel Reclaims AI Control Plane: Xeon 6+ and E835 Target Agentic Orchestration

Intel launches Xeon 6+ (288 E-cores on 18A), E835 200GbE controllers, and Crescent Island GPU. The strategy repositions the CPU as the control plane for agentic AI orchestration and data movement, while using E835 Ethernet to standardize AI data center networking.

NVIDIA Other 2026-05-29

DynoSim: Simulating the Pareto Frontier

...

Samsung Electronics Other 2026-05-23

Micron Partners TSMC for Custom HBM4E Logic Dies, Targets 2027 Ramp with 1-gamma DRAM

Micron plans to ramp HBM4E in 2027, transitioning to 1-gamma DRAM and using TSMC for both standard and custom logic dies. This marks a shift from standardized HBM to customized solutions, positioning memory as a strategic asset for AI inference workloads.

Other Other 2026-05-22

BadHost CVE-2026-48710: Starlette Auth Bypass Exposes AI Agent Infrastructure to HTTP Smuggling

BadHost (CVE-2026-48710) exploits Starlette's inconsistent URL reconstruction via Host header injection, bypassing path-based auth. Affecting 400K+ repos including FastAPI, vLLM, and MCP Server, it exposes AI Agent infrastructure to data theft and potential RCE, forcing a security paradigm shift in HTTP parsing.

Google Other 2026-05-21

Google Antigravity Control Plane Redefines AI Development, Locks Agent Orchestration

At I/O 2026, Google launched Antigravity 2.0 desktop app and CLI/SDK as a unified agent control plane, alongside Gemini 3.5 Flash/Omni models, Managed Agents API, and native Android support in AI Studio. This aims to streamline AI development from prototype to production, but effectively locks developers into Google's ecosystem and cloud services.

Cisco Other 2026-05-20

Cisco G300 Intelligent Packet Flow: Hardware-Accelerated AI Networking Breakthrough

Cisco launches Intelligent Packet Flow on Silicon One G300, transforming the fabric into an intelligent system with hardware-accelerated adaptive routing, collective congestion awareness, and telemetry. In 8K-16K GPU clusters, it reduces CCT by 87% vs ECMP, improves JCT by 82%, and unlocks 28% more GPU efficiency.

Intel Other 2026-05-20

Intel Core Ultra 3 SoC Replaces Discrete GPUs in Edge Robotics, Slashing TCO

Intel Core Ultra Series 3 SoC integrates CPU, GPU, and NPU to power edge robotics, replacing discrete GPUs. Partners like Sensory AI run multi-agent AI (vision, language, motion) locally, cutting TCO and eliminating cloud latency. This shifts the cost-performance curve for service robots.

AMD Other 2026-05-20

AMD Ryzen AI Halo & Max PRO 400: Local 300B Parameter Inference, but Hidden Lock-in and Thermal Limits

AMD launches Ryzen AI Halo developer platform (128GB unified memory, 200B parameter models) and Ryzen AI Max PRO 400 series (first x86 client to run 300B parameter models locally). Unified memory, ROCm optimization, and OEM partnerships aim to shift agentic AI from cloud to local, but shared memory bandwidth and thermal constraints limit real-world throughput.

Google Other 2026-05-19

Google Cloud I/O '26: A2A Protocol and Managed Agents API Shift Agent Control Plane

At Google I/O '26, Google Cloud unveiled a unified agent development toolkit featuring Antigravity 2.0, Managed Agents API, ADK 2.0, and the A2A protocol. The platform evolves Vertex AI into Gemini Enterprise Agent Platform, offering a four-rung ladder from low-code to code-first. It aims to bridge local prototyping and secure cloud deployment via a shared protocol layer, but effectively centralizes agent lifecycle control onto Google Cloud's managed plane.

Reports

Filter

HBM Profitability Falls Below DDR5, TrendForce Warns of Multi-Fold Price Surge in 2027

Arm-NVIDIA RTX Spark: Tightly Coupled CPU-GPU for Agentic AI PCs

Arm and NVIDIA RTX Spark: Unified Memory PC Architecture Targets Agentic AI, Encircles x86

NVIDIA DGX Spark Update: One-Click Local AI Agents, Multi-Node Cluster for 400B Models

AWS Hosts OpenAI GPT-5.5 & Codex: Control Shifts from Model to Cloud

NVIDIA FOX Blueprint Shifts Factory Control from PLCs to AI Agents on DGX

NVIDIA Alpamayo: Closed-Loop RL Post-Training Bridges AV Sim-to-Real Gap

NVIDIA Cosmos 3: Open-Source Physical AI Model with MoT for Ecosystem Lock-in

NVIDIA BlueField DPU In-Silicon Security Shifts AI Factory Control from Software to Hardware

NVIDIA Vera CPU: Custom Olympus Core and LPDDR5X Redefine CPU for Agentic AI Factories

NVIDIA DSX OS: Open Source Software to Seize AI Factory Control Plane

Intel Reclaims AI Control Plane: Xeon 6+ and E835 Target Agentic Orchestration

DynoSim: Simulating the Pareto Frontier

Micron Partners TSMC for Custom HBM4E Logic Dies, Targets 2027 Ramp with 1-gamma DRAM

BadHost CVE-2026-48710: Starlette Auth Bypass Exposes AI Agent Infrastructure to HTTP Smuggling

Google Antigravity Control Plane Redefines AI Development, Locks Agent Orchestration

Cisco G300 Intelligent Packet Flow: Hardware-Accelerated AI Networking Breakthrough

Intel Core Ultra 3 SoC Replaces Discrete GPUs in Edge Robotics, Slashing TCO

AMD Ryzen AI Halo & Max PRO 400: Local 300B Parameter Inference, but Hidden Lock-in and Thermal Limits

Google Cloud I/O '26: A2A Protocol and Managed Agents API Shift Agent Control Plane