Please consider a donation to the Higher Intellect project. See https://preterhuman.net/donate.php or the Donate to Higher Intellect page for more info.

NUMAlink

From Higher Intellect Vintage Wiki

Overview

NUMAlink is a high-speed low-latency switched fabric computer bus used as a shared memory computer cluster processor interconnection in Silicon Graphics computer systems. NUMAlink was developed by SGI for their SGI Origin and SGI Onyx systems. It was initially branded as "CrayLink" during SGIs ownership of the Cray Computer Corporation.

For computer clusters, low latency of the interconnect is often more important to overall performance than overall bandwidth. This is more an issue for applications that pass small messages. For instance, gigabit Ethernet has performance on the order of 100 MB/s, but typical latencies of 30 usecs even for one-word messages. This is due to the overhead of the Ethernet protocol stack, which has to encapsulate the message in a standardized package and then unpack it at the far end.

NUMAlink, like other products aimed as the same market space, attempt to improve performance by dramatically reducing the packet overhead. Typically this is accomplished by using a much smaller minimum packet size, and using circuit switched networks that do not have to be actively routed during transport (the route is set up only once). SGI claims particularly impressive numbers for NUMAlink, stating that the typical short-message overhead is only 1 usec, half that of competing systems.

Latency directly effects the "efficiency" of a system, one of the important measures used in Linpack for benchmarking supercomputer installations. NUMAlink offered an average of 84% efficiency on the TOP500 list, while QsNet and Infiniband reached 75%, Myrinet 63%, and 59% for gigabit ethernet.

Moreover, NUMAlink is extremely fast. The basic system offers 3.2 GB/s unidirectionally, about twice that of most similar systems, and 32 times that of gigabit ethernet. Fully expanded InfiniBand systems, that is "quad-rate 12X" systems, offer up to 12 GB/s, but it appears no such solution is actually in use.

Performance

The following excerpt is taken from an archived copy of an SGI web page (see References, below) and refers to the NUMAlink 4 interconnect:

Data crosses over an SGI NUMAlink switch, round-trip, in as little as 50 nanoseconds—less time than it takes a beam of light to travel 50 feet—compared to 10,000 nanoseconds or more with many commodity clustering interconnects. Furthermore, SGI NUMAlink technology is the only interconnect that provides global shared memory between cluster nodes.

The industry-leading performance of NUMAlink interconnect technology is clear when comparing bandwidth and latency characteristics to other interconnects (Table 1). This translates into better system performance in MPI applications as well as industry standard system benchmarks, such as Linpack (Table 2).

Table 1 - Latency and Bandwidth performance of common interconnect technologies
Technology Vendor MPI Latency
(usec, short message)
Bandwidth per link
(unidirectional, MB/s)
NUMAlink 4 (Altix) SGI 1 3200
RapidArray (XD1) Cray 1.8 2000 (1)
QsNet II Quadrics 2 900 (2)
Infiniband Voltaire 3.5 830 (3)
High Performance Switch IBM 5 1000 (4)
Myrinet XP2 Myricom 5.7 495 (5)
SP Switch 2 IBM 18 500 (6)
Ethernet various 30 100

Bandwidth per link citations are from the following sources:

  1. http://www.cray.com/products/xd1/index.html#RapidArrayInterconnect
  2. http://doc.quadrics.com/Quadrics/QuadricsHome.nsf/DisplayPages/81DD13F71CFD762580256EAD0010AA75/$File/Performance.pdf
  3. http://nowlab.cis.ohio-state.edu/projects/mpi-iba/
  4. http://publib-b.boulder.ibm.com/Redbooks.nsf/f338d71ccde39f08852568dd006f956d/55258945787\efc2e85256db00051980a?OpenDocument
  5. http://www.myricom.com/myrinet/performance/
  6. http://www-1.ibm.com/servers/eserver/pseries/hardware/whitepapers/sp_switch_perf.pdf
Table 2 - Comparison of Linpack system efficiencies in the November 2004 Top 500 list
System/interconnect Avg. Linpack efficiency
for 256P system, %*
Sample size, number of
systems on list*
SGI Altix/NUMAlink 4 84 14
HP Superdome 79 18
Various/Quadrics 75 4
Various/Infiniband 75 3 (one system @288P)
Various/Myrinet 63 19
Various/Gigabit Ethernet 59 14
  • Linpack Rmax/Rpeak for 256P systems listed on November, 2004 Top 500 list - see www.top500.org


Development History

NUMAlink 1

There was no NUMAlink 1, as SGI's engineers deemed the system interconnect used in the Stanford DASH multicomputer to be the first generation NUMAlink interconnect.

NUMAlink 2

NUMAlink 2 is the second generation of the interconnect, introduced in 1996 and used in the Onyx2 visualization systems, the Origin 200 and the Origin 2000 servers and supercomputers. The NUMAlink 2 interface was the Hub ASIC. NUMAlink 2 is capable of 1.6 GB/s of peak bandwidth through two 800 MB/s, PECL 400 MHz 16-bit unidirectional links.

NUMAlink 3

NUMAlink 3 is the third generation of the interconnect, introduced in 2000 and used in the Origin 3000 and Altix 3000. NUMAlink 3 is capable of 3.2 GB/s of peak bandwidth through two 1.6 GB/s unidirectional links.

NUMAlink 4

NUMAlink 4 is the fourth generation of the interconnect, introduced in 2004 and used in the Altix 4000. NUMAlink 4 is capable of 6.4 GB/s of peak bandwidth through two 3.2 GB/s unidirectional links.

NUMAlink 5

NUMAlink 5 is the fifth generation of the interconnect, introduced in 2009 and used in the Altix UV series. NUMAlink 5 is capable of 15 GB/s of peak bandwidth through two 7.5 GB/s unidirectional links.


References


See Also

  • Infiniband
  • QuickRing
  • Quadrics (QsNet)
  • High Performance Switch