On the eve of the AI inference explosion, NVIDIA plays another "trump card"_NEWS_ABOUT US

On the eve of the AI inference explosion, NVIDIA plays another "trump card"

Overnight, the Internet has become the new favorite in the AI era. Almost all enterprises involved in data centers have been talking about the Internet in recent months. Gpus remain hot, but AI networks seem to have received more attention. From Silicon Valley giants to start-ups, everyone is enthusiastically talking about the diverse applications and broad prospects of AI networks.

Behind this phenomenon lies the rise of "AI factories" and "AI clouds", where an AI factory refers to a supercomputer or data center specifically designed to process large amounts of data and generate intelligence. It compares the "AI development" process to an industrial manufacturing factory: just as a factory converts raw materials into products, an AI factory transforms raw data into "intelligence" or "solutions". AI cloud, on the other hand, provides elastic and scalable AI services on public or private cloud platforms. Users can access models and computing power without building their own infrastructure, emphasizing flexibility and universality.

Having transformed from traditional data centers, they no longer merely process and store data; instead, they convert massive amounts of raw data into real-time insights and value, shouldering the crucial task of "manufacturing intelligence".

Obviously, the outdated underlying network architecture of the past is no longer capable of supporting such demands.

It is precisely for this reason that the network advantages of AI giant NVIDIA have begun to fully emerge. Besides the well-known InfiniBand technology, its Ethernet architecture Spectrum-X specifically designed for AI is attracting much attention.

According to the IDC report, driven by the strong momentum of the Spectrum-X Ethernet network platform, Nvidia's data center Ethernet switch revenue achieved an astonishing growth of 183.7% from the fourth quarter of 2024 to the first quarter of 2025, accounting for 12.5% of the entire Ethernet switch market. It has even achieved a market share of 21.1% in the data center sub-sector.

Relying on the Spectrum-X Ethernet network platform, which was released just over two years ago, NVIDIA has not only successfully ranked among the top three in the global data center Ethernet market, but also seized the initiative in the rapidly emerging AI Ethernet market, establishing an undisputed leading position.

Although the outside world still habitually defines NVIDIA as a GPU giant, in places where the spotlight is out of reach, this company is reshaping the data center network landscape of the AI era at an astonishing speed.

The true confidence of 4 trillion

For NVIDIA, its layout in the AI network was much earlier than that of other giants.

On September 30, 2020, to celebrate the 40th anniversary of the Ethernet specification, Jensen Huang, the founder and CEO of NVIDIA, specially interviewed Bob Metcalfe, the inventor of Ethernet.

In the interview, the two discussed a thought-provoking question: Does NVIDIA's core competitiveness lie in the Gpus themselves or in the interconnection technology between Gpus?

The background of this issue is that NVIDIA just completed an acquisition in April 2020. It offered $6.9 billion to acquire the chip design company Mellanox. This company not only developed a series of computing network products based on the InfiniBand standard but also launched the Spectrum switch based on the Open Ethernet standard.

Whether it is InfiniBand or Spectrum Ethernet, both belong to a technical direction for interconnection between servers. InfiniBand technology focuses on ultimate high performance and plug-and-play functionality, while Spectrum Ethernet is a perfect blend of high performance and traditional cloud application scenarios. Nvidia's acquisition of Mellanox means that it now has two trump cards for direct interconnection of GPU servers, taking into account the urgent market demands for performance, scalability and service.

Facing this issue, Metcalfe firmly pointed out that NVIDIA's true confidence lies in GPU interconnects.

Five years later, today, Nvidia's market value has soared to 4 trillion US dollars, ranking among the world's most valuable enterprises. Behind this astonishing achievement, apart from the currently highly sought-after Blackwell chip, there is also its GPU server interconnection technology that has long been unrivaled in the world - namely AI networks.

At this point, a new question emerged: With so many competitors in the AI network market, why is it that only NVIDIA has been able to win the "favor" of so many giants?

Let's start with InfiniBand.

As a powerful network architecture, InfiniBand is specifically designed to support I/O connections in high-performance computing and AI data center infrastructure. Its uniqueness lies in its ability to serve as both an "in-box" backplane solution (component interconnection on PCBS) and "out-of-box" device interconnection via copper cables or optical fibers. It has unified the functions of traditional bus and network interconnection.

In addition, the high bandwidth, low latency, low power consumption and scalability of InfiniBand make it a perfect match for AI data centers. For instance, the latest InfiniBand XDR network has a bandwidth of up to 800Gb/s, and its development speed far exceeds that of PCIe At present, x86 servers do not support PCIe6.0 to meet their bandwidth requirements. Technologies such as Multi Host or Socket Direct need to be adopted to achieve the 800Gb/s uplink bandwidth requirement of the XDR network. The industry's first network to support RDMA (Remote Direct Memory Access) technology enabled wire-speed data transmission without CPU intervention approximately 20 years ago. It is also the first network in the industry to achieve network computing, which can offload the complex collective communication computing in HPC and AI workloads to the network switch, effectively improving communication performance and reducing network congestion.

To put it figuratively, InfiniBand is like a specially designed dedicated highway, offering high speeds while effectively avoiding congestion. Its unique architecture can achieve high bandwidth while significantly reducing latency, giving it a considerable advantage over traditional network architectures. It is highly suitable for AI factories that train various large language models (LLMS).

The Spectrum X network platform launched by NVIDIA in 2023 is specifically designed for AI application scenarios, further optimizing and upgrading traditional Ethernet. It is an end-to-end AI network technology that collaboratively designs from network cards to switches.

The first issue is the latency problem that has been most criticized in traditional Ethernet. Spectrum-X has significantly reduced the communication latency of AI services through end-to-end optimized design to the greatest extent. By using RDMA technology derived from InfiniBand networks, it has achieved direct communication between GPU memory and GPU memory. It significantly reduces communication latency and provides more directions for users' communication optimization. In terms of network congestion and packet loss issues, Spectrum-X incorporates the Adaptive Routing technology that has been maturely applied on InfiniBand networks. According to the network load conditions, it adjusts the data transmission path in real time to maximize the utilization rate of network bandwidth.

Meanwhile, Spectrum-X is also equipped with the Performance Isolation technology of InfiniBand network. This technology ensures that in a multi-tenant and multi-tasking environment, the network traffic between applications of different tenants does not interfere with each other. Even if a business experiences a sudden traffic burst, It will not affect other businesses either, ensuring that each task can run in the optimal environment and achieve the performance of Bare Metal.

Unlike InfiniBand, Spectrum-X is targeted at the AI cloud market. It also takes into account the flexibility and security of Ethernet, enabling traditional Ethernet users to quickly migrate to AI data center networks without changing their usage habits. The trend of data centers moving towards AI data centers (AI factories and AI clouds) is inevitable. AI models are gradually replacing traditional application models. The emergence of Spectrum-X has achieved a smooth upgrade and expansion of traditional cloud infrastructure towards AI, meeting the application demands of more cloud service providers for large-scale generative AI and more.

In fact, currently in the field of AI training, NVIDIA networks have firmly taken the leading position: AI giants such as Microsoft Azure, OpenAI and Meta have long adopted InfiniBand to train their large language models. Meanwhile, Spectrum X has rapidly gained a large number of new and old customers in the past year, achieving an unprecedented explosive growth in the history of network development and becoming the king of data center networks. So far, NVIDIA has provided a solid foundation for the vertical scaling (Scale Up) of its AI business through the NVLink network, and offers unlimited possibilities for horizontal scaling (Scale Out) through the Spectrum-X network and InfiniBand network. Through the combination with various industry-leading GPU technologies, Ultimately, a closed loop of AI business in computing and communication was achieved, paving the way for AI data centers to move towards high performance and unlimited scalability.

However, the development of AI clearly will not be confined to the field of training alone. The core that can support NVIDIA's market value of 4 trillion has always been its forward-looking insight into the development trend of AI and its preemptive strategic vision.

From training to reasoning

At the GTC conference held in March this year, NVIDIA sent out an important industry signal: As the demand growth for large-scale model training in the AI industry gradually slows down, coupled with the breakthrough innovations in inference technology by companies like DeepSeek, the entire AI industry is approaching a critical turning point from the training era to the inference era.

Behind this transformation lies a brand-new blue ocean that far exceeds the training market in scale - the reasoning market. If the training stage is the "casting" process of AI capabilities, then the reasoning stage is the "actual combat" of these capabilities, and its market potential and commercial value will grow exponentially.

But new problems followed one after another.

On the one hand, as the complexity of inference models increases, tasks that originally only needed to be processed on a single GPU or a single node are beginning to shift towards parallel processing on multiple Gpus and multiple nodes. Reasoning is no longer the traditional "one question and one answer", but has entered the "Test-time Scaling" stage - simulating multiple solution paths in each request and selecting the optimal solution. This kind of reasoning logic is essentially a real-time deep deduction, which requires a large amount of token processing and context backtracking to be completed within milliseconds. This also means that the requirements for latency, bandwidth and synchronization mechanisms of the reasoning system have risen significantly.

On the other hand, the current reasoning tasks increasingly show a trend of "P-D separation", that is, deploying the Prefiling and Decoding tasks on different hardware nodes to achieve the optimal utilization of resources and avoid conflicts of computing and communication resources in the Prefiling and Decoding stages. However, this also makes the data exchange between Prefiling and Decoding a bottleneck.

In addition, large model inference (especially MoE-based inference large models) also highly relies on KVCache (key-value cache). Its storage scale often increases rapidly with the increase in the number of input tokens, which leads to the possibility that KVCache may be stored in GPU video memory, CPU memory, or local SSD of the GPU server. Or remote shared storage. Due to the frequent sharing and update of KVCache among multiple Gpus, this exerts "bidirectional pressure" on the network: in the east-west direction, high-speed sharing of KV among Gpus through RDMA is required, while in the north-south direction, low-latency scheduling and high-performance data transmission between Gpus, storage, and cpus are demanded.

Nvidia has provided efficient solutions to the practical problems encountered in this part of reasoning:

In terms of distributed inference, NVIDIA's existing InfiniBand and Spectrum-X Ethernet architectures have built a network layer with RDMA, intelligent congestion control, and QoS capabilities, providing the necessary "highway" for it.

In the communication bottleneck of PD separation, NVIDIA has built a high-speed interconnection channel through NVLink/NVSwitch and achieved deep coupling between the CPU and GPU with Grace CPU. Under a unified shared memory domain, it greatly reduces the data transfer and latency of main memory and significantly improves the inference efficiency.

Finally, there are the two-way challenges faced by the KVCache storage structure. Nvidia has introduced a dual-engine collaborative architecture of BlueField SuperNIC (Super Network Interface Card) and BlueField DPU (Data Processing Unit). As a high-performance intelligent network card specifically designed for AI loads, the former can accelerate the KV sharing operation between GPU nodes in the KVCache scenario, ensuring the lowest cross-node token processing latency and the maximization of bandwidth. The latter, on the other hand, establishes an intelligent "data superhighway" between the CPU and GPU, transferring the traditional tasks such as KVM cache handling, scheduling, and access control that were previously handled by the CPU to the DPU for execution. This effectively reduces latency, frees up CPU resources, and enhances the overall IO throughput efficiency.

The above-mentioned major issues are the network problems encountered in reasoning, and in large-scale reasoning clusters, there are also different difficulties.

Many people originally thought that Reasoning was a lightweight task that could be accomplished by a single node, but the reality is quite the opposite. More and more enterprises are directly using training clusters for reasoning, especially in the Agentic Reasoning (autonomous Agent reasoning) scenario, where each agent requires an independent database and long context processing capabilities. The consumption of computing power and network resources exceeds that of training.

To address this trend, NVIDIA has launched the AI Fabric network architecture. Through the collaboration of NIXL (NVIDIA Inference Xfer Library) and the Dynamo inference framework, it can support dynamic path selection and GPU-level resource scheduling. This enables the reasoning system to remain flexible and real-time even under large-scale deployment, solving the resource orchestration bottleneck of large-scale reasoning clusters.

The power consumption and stability of interconnected devices are another major headache for enterprises. With the rapid growth of the number of Gpus required for inference, network interconnection has become a key component of the entire system's energy consumption. Traditional electrical connections (such as copper cables) are limited by connection distances and have become expansion bottlenecks. Optical interconnection has become the mainstream in AI data centers.

For this reason, NVIDIA has launched CPO (Co-Packaged Optics, optoelectronic integrated packaging) technology, integrating optical modules into the packaging of switching chips, effectively reducing power consumption and enhancing reliability. It is understood that compared with traditional optical modules, CPO can save 30% to 50% of network energy consumption, which is equivalent to releasing tens of megawatts of electricity for GPU computing in hyperscale data centers.

It is worth mentioning that CPO also brings benefits at the operation and maintenance level - the number of optical modules is reduced, the failure rate caused by manual plugging and unplugging of modules is lowered, and the number of lasers is reduced by four times. All these not only enhance the overall system resilience but also support higher-density deployment.

It can be seen that NVIDIA networks are building a brand-new foundation for the inference era with the technological accumulation of the training era From BlueField SuperNIC and BlueField DPU, to Spectrum-X, AI Fabric, CPO optoelectronic integrated network switches, and then to the full-stack optimized software ecosystem, its inference infrastructure landscape has taken shape.

Mastering the Internet means mastering the future

In the communication with Metcalfe in September 2020, Huang Rengxun also said such a sentence: Customers don't care what technology you adopt, but care more about how to solve their problems.

In his view, NVIDIA's true breakthrough lies not only in the performance advantages of Gpus, but also in its early redefinition of Gpus as platform-level components - just like DRAM and cpus, which can be embedded in solutions to build complete systems for specific problems. Data centers have become computers. The network determines the performance, scalability and efficiency of the data center. This systems thinking is the core underlying driving force behind NVIDIA's transformation from a graphics processing company to an AI data center supplier.

At first, no one believed that Gpus had such a broad future. "Focus on problems that the CPU cannot solve?" That market either doesn't exist at all because there is no solution, or it is very small and it is a market for supercomputers. Neither can succeed. Huang Renxun once said this when recalling the skeptics back then. But NVIDIA's insight is even deeper: The true market often emerges when demand has yet to take shape.

This logic is being replicated on today's AI network platforms. Just as 3D games in the past could not do without Ethernet, today's reasoning models, Agent agents, and generative AI also cannot do without high-speed, stable, and intelligent networks - they still follow Metcalfe's law: the more connections there are, the greater the value of the platform.

In the future, as large-scale inference clusters are accelerated in deployment, the "final battlefield" of AI platforms will no longer be a competition over the performance of a single chip, but rather a competition over the collaborative efficiency of the entire system, ecosystem, and network. On this new battlefield, NVIDIA has not only taken the stage - it is leading the way.

PREV： RETURN DIRECTORY NEXT：