NVIDIA - AI Infrastructure Intelligence Search

NVIDIA Other 2026-07-04

英伟达RTX 5080公版显卡将在BW2026限量发售，售价8299元

...

OpenAI Other 2026-07-03

OpenAI Slashes Inference Costs 50%, Runs ChatGPT on Hundreds of GPUs via System-Level Optimization

OpenAI reduces AI inference costs by over 50% through system-level optimizations: model quantization (FP16 to INT4/INT8), KV-Cache optimization, dynamic batching, and speculative decoding. Using only hundreds of NVIDIA GPUs to serve ChatGPT's unlogged-in traffic, inference gross margin jumps from 38% to 65%, nearing breakeven.

NVIDIA Other 2026-07-02

NVIDIA AI Compute Partnership: Revenue Share and Credit Backstop to Lock Cloud Providers into DSX AI Factories

NVIDIA launches AI Compute Partnership with revenue sharing and credit backstop, shifting from hardware sales to recurring service revenue. Initial projects include 40K GB300 chips for Sharon AI and 170K GPUs for Firmus, totaling 200K+ high-end chips. NVIDIA is becoming the 'central bank' of AI compute, squeezing cloud brokers.

NVIDIA Other 2026-07-01

NVIDIA BlueField-3 DPU: Shifts AI Cloud I/O Control from CPU to Dedicated Silicon, Redefines Compute Delivery & Security

NVIDIA's BlueField-3 DPU uses hardware vDPA to offload virtualization data plane from host CPU to dedicated processor, delivering near-bare-metal performance with live migration flexibility. It also creates a trusted I/O path for confidential computing. However, this fundamentally locks cloud infrastructure into NVIDIA silicon, increasing vendor dependency.

TSMC Other 2026-07-01

Etched Unveils Sohu Transformer ASIC: Claims 20x H100 Inference Throughput, Challenging NVIDIA's Grip

AI chip startup Etched emerges from stealth with Sohu, a Transformer-specific ASIC on TSMC N4P with 144GB HBM3E. By hardwiring attention mechanisms, it claims 20x throughput and 140x price-performance vs. H100 on Llama 70B. With $800M total funding and first racks shipping this summer, it directly challenges NVIDIA's inference dominance.

AMD Other 2026-06-30

AMD and NVIDIA Raise GPU Kit Prices by 10%: GDDR Shortage Exposes AI Supply Squeeze

AMD has notified AIB partners of a ~10% price hike on GPU+GDDR bundled kits effective July 2026, following NVIDIA's similar move on RTX 5090 series. The dual price increases stem from severe GDDR supply shortages driven by the AI boom and memory super-cycle, foreshadowing broad retail GPU price increases in H2.

Anthropic Other 2026-06-30

Anthropic Claude Goes Exclusive on Azure, Microsoft Locks AI Model Distribution via GB300

Anthropic's Claude models are now generally available on Azure Foundry, powered by NVIDIA GB300 NVL72 clusters with over 4600 Blackwell Ultra GPUs. Initial models include Opus 4.8 and Haiku 4.5 with prompt caching and extended thinking. Microsoft gains exclusive enterprise distribution, strengthening its competitive position against AWS and Google Cloud.

Samsung Electronics Other 2026-06-29

Samsung and SK Hynix Announce $300B Investment to Dominate AI Memory and Foundry

Samsung and SK Hynix announce a 10-year, 1,000 trillion won investment plan to expand HBM4 production, improve 3nm GAA yield, and build new AI chip fabs. This aims to cement their HBM duopoly and close the gap with TSMC in advanced foundry, reshaping global AI infrastructure supply chain costs.

OpenAI Other 2026-06-26

OpenAI and Broadcom Tape Out First Inference ASIC Jalapeño in 9 Months, Targeting NVIDIA Dominance

OpenAI and Broadcom unveil Jalapeño, their first custom inference ASIC, fabricated on TSMC 3nm and optimized for Transformer models. Targeting a 50% inference cost reduction, it taped out in 9 months and is slated for deployment in gigawatt-scale data centers by late 2026, marking OpenAI's strategic pivot to full-stack AI infrastructure and a direct challenge to NVIDIA's inference hegemony.

Qualcomm Other 2026-06-26

Qualcomm Acquires Modular for $3.9B, Open-Sources Mojo to Break CUDA Lock-In

Qualcomm acquires Modular for $3.9B in stock and open-sources Mojo, a Python-compatible systems language. Mojo targets CUDA dependency, aiming to provide a high-performance alternative for AI developers. This move strengthens Qualcomm's AI inference chip software stack and edge AI competitiveness.

NVIDIA Other 2026-06-26

NVIDIA Rubin Mandates 100% Liquid Cooling with 45°C Warm Water, Reshaping Data Center Thermal Design

NVIDIA reveals Rubin platform's full liquid cooling design: 100% liquid, 45°C warm water inlet, eliminating chillers and fans. Mass production starts H2 2026, with a mandate for all data centers to transition to liquid cooling, marking a definitive shift in AI thermal management.

Qualcomm Other 2026-06-25

Qualcomm HBC Gen 1 Stacks LPDDR to 133 TB/s, Challenging HBM Dominance

Qualcomm announces HBC Gen 1, a 3D-stacked LPDDR memory with integrated compute die, achieving 133 TB/s bandwidth and 6x energy efficiency over HBM. Aimed at replacing HBM in AI accelerators, shipping with AI250 in mid-2027, but supply chain and feasibility remain uncertain.

OpenAI Other 2026-06-25

Oracle Defense Ecosystem Cohort 3: Offline AI on Roving Edge Devices Goes Operational

Oracle announced the third cohort of its Defense Ecosystem at the Brussels summit, adding 10 companies. Concurrently, Whitespace's Saga AI system deployed on Oracle Roving Edge Devices during Royal Navy's Operation HIGHMAST, running classified AI workloads completely offline, proving sovereign edge AI is operational.

Google Cloud Other 2026-06-25

Google Cloud Multi-Agent Architecture Shifts Control from Human to Autonomous Verification

Google Cloud introduces agent-scale data management with multi-agent verification to reduce human oversight. Deploys six Gemini agents with Nokia for autonomous network operations. Amazon plans to commercialize Trainium chips, intensifying AI hardware competition against Google TPU and Nvidia GPU.

NVIDIA Other 2026-06-25

Qualcomm Dragonfly: 250-core CPU, HBC memory, UALink interconnects target AI inference TCO

Qualcomm unveils full data center portfolio: Dragonfly C1000 250-core Oryon CPU (>5GHz, PCIe Gen7, CXL), HBC near-memory compute (133TB/s Gen1, 18x-54x effective BW), AI300 inference accelerator (UALink/ESUN scale-up), and 800G/1.6T connectivity. Multi-year Meta CPU deal. Commercial sampling 2027-2028. Targets inference TCO with tokens-per-watt leadership.

OpenAI Other 2026-06-25

OpenAI and Broadcom Unveil Jalapeno Inference ASIC, Reshaping AI Hardware Landscape

OpenAI, in collaboration with Broadcom, has developed Jalapeno, a custom LLM inference accelerator. The chip uses a multi-chip module with HBM3E memory and achieved tape-out in just nine months. Designed for OpenAI's model stack, it aims to reduce inference costs and dependency on NVIDIA GPUs, with initial deployment planned for late 2026.

AMD Other 2026-06-24

TSMC Hikes Advanced Node Prices 5-10%, Squeezing AI Chip Margins

TSMC informs clients of 5-10% price hikes across all advanced nodes (7nm+), affecting 74% of wafer revenue. Apple, Nvidia, AMD, and others face higher costs, potentially raising AI infrastructure prices.

Cisco Other 2026-06-24

Cisco Live US & InfoComm 2026 : la collaboration entre dans l’ère agentique

...

NVIDIA Other 2026-06-24

NVIDIA and AWS Default GPU Vector Search with cuVS, G7 Instances Deliver 4.6x Inference

NVIDIA and AWS collaborate to embed cuVS as default GPU-accelerated vector search in OpenSearch Serverless, delivering 10x faster indexing at 1/4 cost. New EC2 G7 instances with RTX PRO 4500 Blackwell GPUs achieve up to 4.6x inference performance. AWS achieves GB300 Exemplar Cloud status for training.

ARM Other 2026-06-24

China's LineShine Tops TOP500: CPU-Only 2.2 ExaFLOPS with ARMv9 and HBM Memory

LineShine supercomputer achieves 2.198 ExaFLOPS FP64 sustained using 13.79 million ARMv9 cores across 20,480 nodes, making it the first system to exceed 2 ExaFLOPS without GPUs. Each node has dual LX2 CPUs (304 cores) with 32GB HBM, demonstrating a CPU+HBM architecture breakthrough for HPC.

Reports

Filter