Filter

×
Active Filters Clear All
Keyword: vLLM ×
27 Total Reports
2/2 Page
Google Other Medium Signal 2026-04-03

Google Launches Gemma 4 Open Model Family

Google introduces Gemma 4 open model family with four size variants, optimized for edge and mobile devices. The series supports multimodal processing, long context windows and 140+ languages under Apache 2.0 license.

AMD Other High Signal 2026-04-02

AMD Announces Breakthrough MLPerf Inference 6.0 Results, Showcasing Multinode Scaling and Multimodal Capabilities

AMD's MLPerf Inference 6.0 submission, powered by Instinct MI355X GPUs, surpassed 1 million tokens per second for the first time on models like Llama 2 70B and GPT-OSS-120B. The results highlight efficient multinode scaling, rapid enablement of new workloads (e.g., text-to-video model Wan-2.2-t2v), and reproducible performance across a broad partner ecosystem.

NVIDIA Other 2026-03-24

NVIDIA IGX Thor: 8x Edge AI Compute with ConnectX-7 Network Lock-In

NVIDIA launches IGX Thor edge AI platform with Blackwell GPU, up to 5,581 FP4 TFLOPS, dual 200GbE RDMA via ConnectX-7, and ISO 26262 safety. Pin-compatible with Jetson Thor and 10-year lifecycle enable seamless migration, but create vendor lock-in through proprietary networking and GPU dependencies.

Meta Other High Signal 2026-03-11

Meta Accelerates Custom AI Chip Roadmap with Focus on Inference Optimization

Meta plans to launch four generations of MTIA AI chips in two years, adopting an 'inference-first' design strategy optimized for generative AI tasks. Built on PyTorch and open standards, the chips enable seamless data center deployment, targeting improved compute efficiency and cost control.

NVIDIA Other High Signal 2026-03-11

NVIDIA Jetson Advances Localized Deployment of Open-Source AI Models at Edge

NVIDIA's Jetson edge AI platform enables localized deployment of open-source generative AI models like Qwen3 4B and Mistral 3 on edge devices. The platform offers a complete hardware range from Jetson Orin Nano to Thor, integrating compute and memory in SoM for simplified design. Key performance shows Jetson Thor achieves 52 tokens/sec for Mistral 3 inference.

Trend Micro Other High Signal 2026-03-03

Trend Micro Report Highlights AI Supply Chain Risks and Model Attack Surfaces

Trend Micro's 'Fault Lines in the AI Ecosystem' report systematically analyzes security risks in the AI supply chain, including training data poisoning, third-party plugin vulnerabilities, and model theft attacks. It indicates that enterprise AI security boundaries have expanded from traditional IT infrastructure to the model layer and data pipelines.

NVIDIA Other 1970-01-01

SGLang 0.5.13: Two-Stage MoE Routing Prefetch & Sparse KV Cache Deliver 25x Inference Speedup

SGLang 0.5.13 introduces MoE-specific two-stage routing prefetch (lightweight proxy network to preload top-k expert weights) and sparse KV cache (grouped by activation path), achieving 25x inference speedup on NVIDIA GB300 NVL72. On A100, throughput +65%, latency -40%, memory -10%, routing overhead -62%, outperforming vLLM.