ARTIFICIAL INTELLIGENCE

Meta Advances AI Networking with Open Standards and New Architectures

Meta unveils significant advancements in AI and networking at the OCP Global Summit, pushing open standards and introducing new fabric architectures.

Read time: 6 min read
Word count: 1,348 words
Date: Oct 16, 2025

Summarize with AI

Meta, a founding member of the Open Compute Project, showcased its latest innovations in AI infrastructure and networking at the OCP Global Summit. The company is driving efforts for open standards and announcing its participation in the Ethernet for Scale-Up Networking (ESUN) initiative. Key technological developments include the evolution of its Disaggregated Scheduled Fabric (DSF), the introduction of a new Non-Scheduled Fabric (NSF) architecture for ultra-large AI clusters, and new optical networking solutions. These advancements aim to enhance flexibility, scalability, and efficiency for Meta's growing AI workloads and data centers.

Meta engineers are at the forefront of AI and networking innovation. Credit: Shutterstock

🌟 Non-members read here

Meta’s AI Infrastructure Push: Driving Open Standards and Innovation

Meta, a pivotal player since the inception of the Open Compute Project (OCP) in 2011, continues to spearhead technological advancements in artificial intelligence (AI) and networking. At the recent 2025 OCP Global Summit in San Jose, California, the company detailed its latest infrastructure breakthroughs, emphasizing the need for open systems and robust solutions to support increasingly demanding AI workloads. These efforts reflect a commitment to pushing the boundaries of what is possible in large-scale computing environments.

The rapid rise of AI has fundamentally reshaped assumptions about infrastructure scaling. Building effective AI infrastructure requires innovation across every layer of the technology stack, from hardware and software to networks and data centers themselves. This comprehensive approach ensures that Meta can meet the escalating demands of its AI-driven services and applications, which are integral to its global operations.

A core tenet of Meta’s philosophy has always been the promotion of open systems development. The company continues this commitment by advocating for standardization across various infrastructure components. Standardization is crucial for systems, racks, and power delivery, especially as rack power density continues to climb in modern data centers. This push extends to the scale-up and scale-out networks utilized by AI clusters.

Standardization allows customers to integrate diverse GPUs and accelerators, ensuring they can always leverage the latest and most cost-effective hardware. Furthermore, there is a significant need for innovation in software and standards that enable workloads to run seamlessly across heterogeneous hardware types, even when spread across different geographical locations. Establishing open standards throughout the entire stack is seen as a massive opportunity to remove friction that currently slows down the deployment of advanced AI infrastructure.

The Ethernet for Scale-Up Networking (ESUN) Initiative

As part of its ongoing drive for standardization, Meta announced its significant involvement in the new Ethernet for Scale-Up Networking (ESUN) initiative. This collaborative effort brings together major industry players, including AMD, Arista, ARM, Broadcom, Cisco, HPE Networking, Marvell, Microsoft, NVIDIA, OpenAI, and Oracle. The primary goal of ESUN is to advance networking technology to effectively handle the escalating scale-up domain required by modern AI systems.

ESUN’s focus is exclusively on open, standards-based Ethernet switching and framing for scale-up networking. The initiative deliberately excludes host-side stacks, non-Ethernet protocols, application-layer solutions, and proprietary technologies, ensuring a truly open and interoperable framework. The group will concentrate on the development and interoperability of XPU network interfaces and Ethernet switch ASICs specifically designed for scale-up networks, according to a statement from the OCP.

To ensure broad alignment and accelerate innovation, ESUN will actively collaborate with other established organizations. This includes working with the Ultra-Ethernet Consortium (UEC) and the long-standing IEEE 802.3 Ethernet group. By engaging with these bodies, ESUN aims to integrate best practices and align with existing open standards, further strengthening the foundation for future AI infrastructure development. This collaborative approach underscores Meta’s vision for a more interconnected and standardized tech ecosystem.

Advancements in Data Center Networking for AI

Beyond the ESUN initiative, Meta engineers unveiled several pivotal data center networking innovations at the summit. These developments are specifically designed to enhance the flexibility, scalability, and efficiency of Meta’s vast infrastructure, which is increasingly dominated by AI workloads. The innovations represent significant strides in how large-scale AI clusters are built and managed, demonstrating Meta’s proactive approach to meeting future computational demands.

Three key advancements were highlighted. First, Meta detailed the evolution of its Disaggregated Scheduled Fabric (DSF), which now supports scale-out interconnectivity for immense AI clusters spanning entire data center buildings. Second, a completely new Non-Scheduled Fabric (NSF) architecture was introduced. This architecture, built entirely on shallow-buffer, disaggregated Ethernet switches, is poised to support Meta’s largest AI clusters, such as the Prometheus system. Finally, Meta announced the integration of Minipack3N, powered by Nvidia’s Ethernet Spectrum-4 ASIC, into its portfolio of 51Tbps OCP switches. These switches leverage OCP’s Switch Abstraction Interface and Meta’s proprietary Facebook Open Switching System (FBOSS) software stack, showcasing a blend of open hardware and custom software.

Disaggregated Scheduled Fabric (DSF) Evolution

Meta’s Disaggregated Scheduled Fabric (DSF) is an open networking fabric that fundamentally separates switch hardware, Network Interface Cards (NICs), endpoints, and other networking components from the underlying network infrastructure. This disaggregation is achieved through the use of OCP-SAI (Open Compute Project Switch Abstraction Interface) and Meta’s FBOSS software stack. DSF supports Ethernet-based RoCE (RDMA over Converged Ethernet) to endpoints, accelerators, and NICs from various vendors, including Nvidia, AMD, Broadcom, and Meta’s own MTIA/accelerator stack.

A key feature of DSF is its use of scheduled fabric techniques between endpoints, particularly Virtual Output Queuing (VOQ) for traffic scheduling. This approach allows the network to proactively avoid congestion rather than merely reacting to it, which is critical for maintaining high performance in demanding AI environments. Over the past year, Meta has evolved DSF into a two-stage architecture. This enhancement allows it to scale and support a non-blocking fabric that can interconnect up to 18,432 XPUs (processing units). These massive clusters form a fundamental building block for constructing AI infrastructure that can span across regions and even multiple regions, effectively meeting the increased capacity and performance requirements of Meta’s most demanding AI workloads.

Introducing the Non-Scheduled Fabric (NSF) Architecture

Complementing the enhanced DSF architecture, Meta has introduced a new architecture known as the Non-Scheduled Fabric (NSF). This innovative design is based on shallow-buffer OCP Ethernet switches, specifically engineered to deliver extremely low round-trip latency. Low latency is paramount for the rapid communication required within large-scale AI training models and inference systems, where even minor delays can significantly impact performance and efficiency.

The NSF architecture employs a three-tier fabric design that incorporates adaptive routing mechanisms for highly effective load-balancing. This adaptive routing capability is crucial for minimizing congestion and ensuring optimal utilization of GPUs, which are expensive and critical resources for maximizing performance in Meta’s largest AI factories. NSF’s ability to support adaptive routing and dynamic load-balancing serves as a foundational building block for gigawatt-scale AI clusters, such as Meta’s Prometheus system. Prometheus, a prime example of an ultra-large AI cluster, benefits immensely from NSF’s robust performance characteristics.

Moving forward, Meta plans to strategically utilize both DSF and NSF, depending on the specific requirements of different AI workloads. DSF will continue to provide a high-efficiency, highly scalable network solution for large, yet modular, AI clusters. In contrast, NSF will be specifically targeted at the extreme demands of Meta’s largest, gigawatt-scale AI factories like Prometheus, where low latency and robust adaptive routing are absolutely paramount. This dual-fabric strategy allows Meta to optimize its infrastructure for a wide range of AI computational needs, ensuring both efficiency and peak performance where it matters most.

Optical Networking Advancements for Data Centers

Meta’s innovations extend into the realm of optical networking, a critical component for high-speed data transmission within and between data centers. Last year, the company introduced 2x400G FR4 BASE (3-km) optics, which have since become the primary solution supporting next-generation 51T platforms across both backend and frontend networks, as well as DSFs. These advanced optics have been widely deployed throughout Meta’s extensive network of data centers, demonstrating their reliability and performance in real-world large-scale environments.

Building on this success, Meta is further expanding its optical networking portfolio with the launch of 2x400G FR4 LITE (500-m) optics. The FR4 LITE variant is specifically optimized for the majority of intra-data center use cases, supporting fiber links up to 500 meters. This new offering is designed to accelerate cost reduction for optical components while maintaining robust performance for shorter-reach applications. By tailoring optics to specific distance requirements, Meta can achieve greater efficiency and cost-effectiveness across its diverse data center footprint.

In addition to these advancements, Meta has integrated the 400G DR4 OSFP-RHS optics, marking its first-generation DR4 package specifically for AI host-side NIC connectivity. This development ensures high-bandwidth, low-latency connections directly to AI accelerators. Complementing this, new 2x400G DR4 OSFP optics are being deployed on the switch side, providing robust connectivity from the host systems to the network switches. These comprehensive optical solutions are essential for supporting the immense data flows and stringent performance demands characteristic of cutting-edge AI infrastructure.