This week I am attending the InfiniBand Trade Association plugfest at UNH-IOL and next week I’ll be there again for the OpenFabrics Alliance interoperability event. The OFA is an industry alliance representing InfiniBand and iWARP, both high performance networking technologies that allow remote DMA (Direct Memory Access). For my readers who don’t know what RDMA is, I’ll try to explain.
First, a bit of background about DMA. There are many different ways to handle IO. Initially, the CPU was involved in copying data between a device and memory. This means that the CPU can’t be used for anything else during the copy. Since access to IO devices is significantly slower than the CPU (generally orders of magnitude, especially for disks), the CPU is essentially wasted on such a simple task. DMA allows a separate controller to take charge of the transfer between the IO device and main memory. The CPU initiates a transfer, and the DMA controller does the actual work, freeing the CPU for other tasks. When the DMA transfer is completed, it sends an interrupt to the CPU which alerts the operating system that the transfer is done.
In an ideal world the operating system has a number of tasks which can run on the CPU. An ideal load balances tasks which require a lot of IO and tasks which are bound by CPU (for example, many calculations with little IO). By using DMA the system an switch to CPU bound tasks and run them while the slow data transfer happens. Additionally, because the system is not overloaded with interrupts occurring constantly (for example, with a byte-by-byte non-DMA transfer) interactive tasks - such as a keypress being registered - are handled more effectively.
RDMA - Remote DMA - extends this concept to networking. Networking tends to involve a lot of data copying, and a lot of work by the operating system. Many network cards support DMA already, but this only allows a DMA transfer of raw network data from the card into the OS Kernel’s memory. This data must be decoded as it moves through the protocol stack, which typically involves additional copying, and eventually it is deliver to a user space program (which involves yet another copy). All this copying is particularly necessary since the kernel must virtualize the shared network interface to all the programs running on a system. The kernel needs to ensure that a user program cannot read network traffic meant for a different program. As network interfaces keep getting faster this overhead becomes more and more significant.
RDMA gives increased performance at the expense of security. Special network interface cards implement the protocol stack in silicon on the card. A user program can then initiate a remote direct memory-to-memory transfer. The operating system kernel is not involved in the transfer at all. This means the CPU can be busy doing computations while the (comparatively) much slower network link transfers data. This is especially useful for high performance computing. If a parallel task involves a lot of computation with message passing to synchronize, then things can be drastically sped up by using RDMA.
InfiniBand is one technology that implements RDMA. IB uses a high performance dedicated network. It is commonly used for computing clusters, since it uses dedicated InfiniBand switch hardware. iWARP is a second technology for RDMA which also RDMA transfers over standard Internet Protocol links (such as a normal local area ethernet network). It still requires special iWARP “RNIC”s which contain a hardware network stack and enable RDMA operations. But there is no need for special switch hardware, and iWARP can even be used over a wide area.
Obviously this technology is not going to show up in your home any time soon. The target market is the high performance computing market, the storage market, and financial market. Nonetheless, within this limited sector RDMA is a very exciting development.