logo
Casa Casi

WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks

Certificazione
Cina Beijing Qianxing Jietong Technology Co., Ltd. Certificazioni
Cina Beijing Qianxing Jietong Technology Co., Ltd. Certificazioni
Rassegne del cliente
Il personale di vendita della tecnologia il Co., srl di Pechino Qianxing Jietong è molto professionale e paziente. Possono fornire rapidamente le citazioni. La qualità e l'imballaggio dei prodotti sono inoltre molto buoni. La nostra cooperazione è molto regolare.

—— LLC del》 di Festfing DV del 《

Quando stavo cercando urgentemente il CPU di Intel e lo SSD di Toshiba, sabbioso dalla tecnologia il Co., srl di Pechino Qianxing Jietong mi ha dato molto aiuto e mi ha ottenuto i prodotti che ho avuto bisogno di rapidamente. Realmente la apprezzo.

—— Kitty Yen

Sabbioso della tecnologia il Co., srl di Pechino Qianxing Jietong è un rappresentante molto attento, che può ricordarmi degli errori di configurazione a tempo in cui compro un server. Gli ingegneri sono inoltre molto professionali e possono realizzare rapidamente il processo difficile.

—— Strelkin Mikhail Vladimirovich

Siamo molto soddisfatti della nostra esperienza di lavoro con Beijing Qianxing Jietong. La qualità del prodotto è eccellente e la consegna è sempre puntuale. Il loro team di vendita è professionale, paziente e molto disponibile con tutte le nostre domande. Apprezziamo molto il loro supporto e non vediamo l'ora di una partnership a lungo termine. Altamente raccomandato!

—— Ahmad Navid

Qualità: “Grande esperienza con il mio fornitore. Il MikroTik RB3011 era già usato, ma era in ottime condizioni e tutto funzionava perfettamente.e tutte le mie preoccupazioni sono state affrontate rapidamente- Un fornitore molto affidabile.

—— Geran Colesio

Sono ora online in chat

WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks

April 10, 2026
WEKA has announced the integration of its NeuralMesh platform with the NVIDIA STX reference architecture, establishing its Augmented Memory Grid as a key building block for next-generation AI infrastructure. The combined solution addresses one of the most significant bottlenecks in large-scale inference environments: memory constraints that directly affect performance, total cost of ownership, and scalable growth.

Operating through NeuralMesh, WEKA’s Augmented Memory Grid expands GPU memory by externalizing and persisting key-value caches. When deployed with NVIDIA STX, this architecture delivers high-throughput context memory storage for agentic AI workloads, supporting long-context reasoning across sessions, tools, and end-to-end workflows. According to the company, configurations combining NVIDIA Vera Rubin NVL72 systems, BlueField-4 DPUs, and Spectrum-X Ethernet can boost context memory token throughput by 4x to 10x. The platform is also projected to deliver at least 320 GB/s read and 150 GB/s write throughput, more than doubling the performance of traditional AI storage architectures.

ultimo caso aziendale circa WEKA Integrates NeuralMesh with NVIDIA STX to Address AI Inference Memory Bottlenecks  0

Memory Infrastructure Becomes the Inference Bottleneck


WEKA centers this integration on the growing memory wall challenge in modern AI deployments. Within today’s inference pipelines, limited high-bandwidth GPU memory forces frequent KV cache evictions, leading to repeated recomputation and diminished operational efficiency. As system concurrency rises, these inefficiencies multiply, increasing infrastructure expenses and reducing performance predictability.

The company promotes shared KV cache infrastructure as the solution. By preserving persistent context across users and sessions, shared caching eliminates redundant processing and stabilizes token throughput. NVIDIA STX provides the validated reference architecture for this model, while WEKA delivers the storage and memory extension layer.

NeuralMesh and Augmented Memory Grid Architecture


NeuralMesh acts as WEKA’s distributed storage platform, built to integrate seamlessly across the full NVIDIA STX stack. It delivers high-performance data services optimized for AI workloads, while the Augmented Memory Grid serves as a dedicated memory expansion layer that consolidates KV cache outside of GPU memory.

This design allows inference environments to sustain long-context sessions without overloading GPU resources. By retaining cache state and enabling reuse across workloads, the platform maintains high utilization and consistent performance as deployments scale.

WEKA notes that the Augmented Memory Grid, first unveiled at GTC 2025 and now generally available, has been validated on NVIDIA Grace CPU platforms paired with BlueField DPUs. The architecture delivers measurable gains in inference efficiency, including drastically faster time-to-first-token, higher per-GPU token throughput, and stable performance under increased concurrency. Offloading the data path to BlueField-4 also reduces CPU overhead and alleviates I/O bottlenecks.

Performance and Efficiency Gains


In production-like environments, the platform is engineered to enhance responsiveness and infrastructure efficiency. WEKA states that the Augmented Memory Grid can reduce time-to-first-token by 4x to 20x, while increasing per-GPU token output by up to 6.5x. These improvements stem from higher KV cache hit rates and fewer recomputation cycles, enabling systems to maintain performance as context sizes and user counts expand.

Firmus, an AI infrastructure provider, is highlighted as an early adopter leveraging NeuralMesh with NVIDIA-based infrastructure. The firm reports improved token throughput and lower latency at scale, with gains coming from more efficient use of existing GPUs rather than additional hardware deployments.

Implications for AI Infrastructure Design


This integration highlights a shift in AI system design, where memory and storage strategies increasingly define overall performance and cost efficiency. As agentic AI workloads expand and context windows widen, DRAM-only approaches become unsustainable due to rising recomputation costs and underutilized GPUs.

WEKA positions persistent, shared KV cache as a foundational capability for AI factories. Organizations adopting this model can achieve higher GPU utilization, lower energy consumption per inference task, and more predictable scaling. In contrast, environments relying exclusively on local GPU memory will likely face rising operational costs and diminishing returns as workloads grow.

Beijing Qianxing Jietong Technology Co., Ltd.
Sandy Yang/Global Strategy Director
WhatsApp / WeChat: +86 13426366826
Email: yangyd@qianxingdata.com
Website: www.qianxingdata.com/www.storagesserver.com
Business Focus:
ICT Product Distribution/System Integration & Services/Infrastructure Solutions
With 20+ years of IT distribution experience, we partner with leading global brands to deliver reliable products and professional services.
“Using Technology to Build an Intelligent World”Your Trusted ICT Product Service Provider!
Dettagli di contatto
Beijing Qianxing Jietong Technology Co., Ltd.

Persona di contatto: Ms. Sandy Yang

Telefono: 13426366826

Invia la tua richiesta direttamente a noi (0 / 3000)