The performance demands of generative AI and other advanced workloads require new architectural solutions enabled by CXL.
The AI boom is driving major changes in data centers. Demanding AI workloads are driving an unprecedented need for low latency, high-bandwidth connectivity, and flexible access to more memory and compute power when needed. The Compute Express Link (CXL) interconnect provides a new way for data centers to improve performance and efficiency between CPUs, accelerators, and storage and move to a more disaggregated architectural approach.
Data centers face three major memory challenges that stand in the way of improving performance and reducing total cost of ownership (TCO). The first is the limit on the amount of memory that can be attached directly to the CPU, as well as the limitations of the current server memory hierarchy. There is a three-digit latency gap between direct-attached dynamic random access memory (DRAM) and solid-state drive (SSD) storage. When a processor runs out of direct-attached memory, it goes into a standby state because memory needs to be moved to an SSD. This wait, or delay, has a dramatic negative impact on computing performance.
Second, the number of cores in multicore processors is increasing much faster than the bandwidth and capacity that the main memory channel can support. This means that more than a certain number of processor cores starves you of memory bandwidth, and the benefits of additional cores are suboptimal. Finally, a wide range of workloads running on data center servers do not take full advantage of the density of direct-attached DRAM, resulting in underutilized or insufficient memory resources. Masu.
CXL is a widely supported industry standard developed to provide a low-latency memory cache coherent link between processors, accelerators, and memory devices. CXL leverages PCI Express (PCIe), which is popular in data centers, for the physical layer. CXL lets you implement a new memory layer that bridges the gap between direct-attached memory and SSDs to unleash the full power of multi-core processors. Additionally, CXL’s memory cache coherency allows memory resources to be shared between processors and accelerators. Sharing memory on demand is the key to addressing the problem of memory resource scarcity.
Currently at the 3.1 specification level, CXL builds on the great momentum of PCIe technology. CXL 1.1/2.0 uses a PCIe 5.0 PHY that operates at 32 gigatransfers per second (GT/s). CXL 3.1 uses PCIe 6.1 to scale signaling to 64 GT/s.
To support a wide range of computing use cases, the CXL standard defines three protocols: CXL.io, CXL.cache, and CXL.memory. CXL.io provides a non-coherent load/store interface to IO devices and is used for discovery, enumeration, and register access. Functionally equivalent to the PCIe protocol. CXL.io is a fundamental communication protocol, so it can be applied to all use cases. CXL.cache improves performance by allowing devices such as accelerators to efficiently access and cache host memory. For example, with CXL.io and CXL.cache you can improve the performance of workloads shared between an accelerator-based NIC and the host CPU by local caching of data in memory attached to the accelerator. I can. The CXL.memory protocol allows a host, such as a processor, to access device-attached memory using load/store commands. This enables some attractive CXL memory expansion and pooling use cases.
All three CXL protocols are protected by Integrity and Data Encryption (IDE), which provides confidentiality, integrity, and replay protection. To meet CXL’s high data rate requirements without introducing additional delays, the IDE is implemented in a hardware-level secure protocol engine instantiated within the CXL host and device chips.
The performance demands of generative AI and other advanced workloads require new architectural solutions enabled by CXL. Rambus CXL IP solutions are designed to provide innovative chip designs with the throughput, scalability, and security of the latest CXL standards. The new Rambus CXL 3.1 controller IP has a flexible design suitable for both ASIC and FPGA implementations and includes a zero-latency integrity and data encryption (IDE) module that protects against attacks on CXL and PCIe links. I am.
Join us for our February webinar, “Unlocking the Potential of CXL 3.1 and PCIe 6.1 for Next-Generation Data Centers,” and learn how CXL and PCIe interconnections are key to building a scalable and efficient data center infrastructure Learn how.
Lou Ternullo
(All posts)
Lou Ternullo is Rambus’ Senior Director of Product Marketing.