KAIST startup Panmnesia (whose name means “the power to perfectly remember everything you think, feel, encounter and experience”) claims to have developed a new approach to boosting GPU memory.
The company's breakthrough technology makes it possible to add terabytes of memory while maintaining reasonable performance levels using cost-effective storage media such as NAND-based SSDs.
But there's a problem: the technology relies on the relatively new Compute Express Link (CXL) standard, which has yet to be proven in widespread applications and requires specialized hardware integration.
Technical challenges remain
CXL is an open standard interconnect designed to efficiently connect CPUs, GPUs, memory, and other accelerators, allowing these components to share memory coherently, meaning they can access shared memory without copying or moving data, reducing latency and improving performance.
CXL is not a synchronous protocol like JEDEC's DDR standard, so it can accommodate a variety of storage media types without requiring precise timing and latency synchronization. Panmnesia says initial testing shows its CXL-GPU solution performs more than three times better than traditional GPU memory expansion methods.
In its prototype, Panmnesia connected a CXL endpoint containing terabytes of memory to a CXL-GPU via two MCIO (Multi-Channel I/O) cables. These high-speed cables support PCIe and CXL standards, enabling efficient communication between the GPU and memory.
However, implementation may not be straightforward: GPU cards may require additional PCIe/CXL compatible slots, and significant technical challenges remain, especially when integrating CXL logic fabrics and subsystems into current GPUs. Integrating a new standard such as CXL into existing hardware requires compatibility with current architectures and developing new hardware components such as CXL compatible slots and controllers, which can be complex and resource intensive.
Panmnesia's new CXL-GPU prototype may promise unparalleled memory expansion for GPUs, but its reliance on the new CXL standard and the need for specialized hardware could pose obstacles to immediate widespread adoption. Despite these obstacles, the benefits are clear, especially for large deep learning models that often exceed the memory capacity of current GPUs.