Composable disaggregated data center infrastructure promises to change the way data centers for modern workloads are built. However, to fully realize the potential of new technologies, such as CXL, the industry needs brand-new hardware. Recently, Samsung introduced its CXL Memory Module Box (CMM-B), a device that can house up to eight CXL Memory Module – DRAM (CMM-D) devices and add plenty of memory connected using a PCIe/CXL interface.

Samsung's CXL Memory Module Box (CMM-B) is the first device of this type to accommodate up to eight 2 TB E3.S CMM-D memory modules and add up to 16 TB of memory to up to three modern servers with appropriate connectors. As far as performance is concerned, the box can offer up to 60 GB/s of bandwidth (which aligns with what a PCIe 5.0 x16 interface offers) and 596 ns latency. 

From a pure performance point of view, one CXL Memory Module—Box is slower than a dual-channel DDR5-4800 memory subsystem. Yet, the unit is still considerably faster than even advanced SSDs. At the same time, it provides very decent capacity, which is often just what the doctor ordered for many applications.

The Samsung CMM-B is compatible with the CXL 1.1 and CXL 2.0 protocols. It consists of a rack-scale memory bank (CMM-B), several application hosts, Samsung Cognos management console software, and a top-of-rack (ToR) switch. The device was developed in close collaboration with Supermicro, so expect this server maker to offer the product first.

Samsung's CXL Memory Module – Box is designed for applications that need a lot of memory, such as AI, data analytics, and in-memory databases, albeit not at all times. CMM-B allows the dynamic allocation of necessary memory to a system when it needs this memory and then uses DRAM with other machines. As a result, operators of datacenters can spend money on procuring expensive memory (16 TB of memory costs a lot), reduce power consumption, and add flexibility to their setups.

Source: Samsung

Comments Locked


View All Comments

  • mikegrok - Wednesday, April 3, 2024 - link

    I worked with a weather analysis company, and kept trying to talk them into setting a computer up with 1.6TB of ram for their primary analysis instead of using a NAS. They had about 10 physical servers running 24/7 spending 97% of their time on io wait.
  • BvOvO - Wednesday, April 3, 2024 - link

    That weather analysis company's name? Albert Einstein
  • ballsystemlord - Wednesday, April 3, 2024 - link

    If it helps at all, since Firefox switched to their new rendering engine, Gecko, Firefox now happily uses +1GB of RAM for a single webpage. Yes, I have a screen shot, but there's no way to post a picture here.

    Likewise, GCC happily eats +4GB *per source file* when compiling modern C++.
  • PeachNCream - Friday, April 5, 2024 - link

    I usually don't observe Firefox consuming 1GB per tab. Spikes up to 250MB are typical, but it seems more like 100MB is around the running average per thing open on Linux. On Win11 I'm seeing roughly the same consumption as well (just checked). Running an adblocker and noscript on both OSes so fairly lean on addons. With that said, the browser upon launch allocates close to 1GB to itself to include a single page (common for Edge also - cannot test Chrome as its not installed and a Google data collection platform so Chrome may not be a near-identical thing to use to compare), but I'm not sure its fair to say its 1GB per page or per tab. I'm not seeing that play out in day to day usage.
  • ballsystemlord - Saturday, April 6, 2024 - link

    Sorry, I should have been clearer. This occurs on only some webpages. Not every webpage.
  • BigT383 - Wednesday, April 3, 2024 - link

    I wonder how this appears to developers. Is it managed by the OS, (in which case you'd want it as a NUMA node). Or, is is managed by a driver and some sort of API where you have to specifically talk to the device?
  • ballsystemlord - Thursday, April 4, 2024 - link

  • The Von Matrices - Friday, April 5, 2024 - link

    CXL memory expanders are designed to appear as NUMA node(s) - just ones with no CPU cores. So it should require no changes to NUMA-aware software assuming the OS supports CXL, which is something introduced in the latest versions of popular OSes.
  • back2future - Thursday, April 4, 2024 - link

    [ PCIe5.0 grade mainboard processors are topping on ~30- ~64GB/s(?) memory bandwidth for SDRAM sockets each DIMM (DDR4-DDR5, no OC) within a ~3-7" distance from main cpu.

    A PCIe adapter can access (a system bus) for cycles for DMA from the memory controller (arbiter for shared memory) within the main cpu, depending on settings there are preferred devices(cpu #, peripherals) and (possibly restricted) access to memory regions. ]
  • Dolda2000 - Thursday, April 4, 2024 - link

    596 ns is the first concrete latency figures I've seen for CXL devices, so that is very interesting, and also higher than I was expecting. It's not quite, but not far from, an order of magnitude slower than directly attached DRAM, and roughly around 2000 clock cycles for server CPUs.

    Is that really usable? Do CPUs really have enough reordering capacity to work around such massive latencies? Surely these aren't supposed to be used over some sort of asynchronous DMA-based transfer scheme (what would be the point of CXL then)?

Log in

Don't have an account? Sign up now