AIX > Administrator > High Availability

Choosing the Right Interconnect for Your Cluster Environment

High-performance computing (HPC) brings to mind large supercomputers with the fastest of everything. Society has taught us that more is better, so having the fastest of all parts of a cluster must be good. Therefore, creating an effective cluster should be as simple as picking the fastest servers and the fastest interconnect. If that's the case, why are there so many choices? If you answered money, you would be only partially correct.  

After all, look at TOP500's list of the 500 most powerful computer systems ( Among these are the most expensive clusters on the planet, yet more than 40 percent of them use Gigabit Ethernet (GbE) as the interconnect. Why would purchasers with such large amounts of money choose such an interconnect when there are networks with 32 times the bandwidth? 

The key is balance. All parts of the cluster must work in concert to do the work as fast as possible. Given a finite amount of money, optimizing the performance per dollar calls for some carefully considered tradeoffs of communication, compute and storage resources. This lack of one size fits all in the cluster marketplace is why IBM offers many servers and interconnects. These systems are designed to meet the requirements of applications ranging from small servers interconnected via low bandwidth, high-latency networks to large symmetric multiprocessing (SMP) servers with interprocessor cache coherency to large clusters of SMPs connected with a high-performance, low-latency communication fabric. Different architectures are available to fit different needs.

The Tortoise vs. The Hare

Looking at the TOP500 list, one could argue that interconnects like GbE are only used on the "slow" machines while the top machines use proprietary high-speed links. In general this is true; the average power of a GbE-connected cluster is just under 2 teraflops (TFLOPS), while clusters with InfiniBand average 3.5 TFLOPS, and the "proprietary" category blazes in at an average of 21.6 TFLOPS. But averages can be deceiving. Although it's true that the fastest machine on the list with GbE, Myrinet, comes in at a mere No. 53 with a 2.2 TFLOP average, just slightly higher than GbE is the interconnect on machine No. 5. Clearly the tradeoffs can be complex.

So why would you choose GbE? Availability and cost are large drivers. GbE, along with its slower cousins--10 MbE and 100 MbE--is an industry-standard interconnect that's widely available. Network interfaces are often built into servers or are available from many vendors, as are switches. GbE, which has a relatively low bandwidth (125 MB/sec per direction) and high latency (typically on the order of 50 microseconds), has been used for general communications among servers where time spent in computation is large compared with time spent in communications, and where the processes running on different servers communicate infrequently.

Ethernet is also the least expensive of the widely used technologies, and GbE is believed to be the most widely used interconnect in the high-performance server market due to its low price. So for many embarrassingly parallel applications (e.g., low to no communication between servers, such as the seti@home project), relying on GbE allows money that would've been spent on a faster interconnect to instead buy more computing power. The net result is that more work gets done in the cluster per dollar spent. Other examples include capacity computing, where many independent jobs are run on the cluster at once, as opposed to running one large-scale job across the entire cluster. If there isn't a need for increased performance on a single application, but rather for focus on aggregate throughput, this technique can lead to better efficiency (since programs rarely scale linearly with increased cluster size) and reduce the networking requirements.

Ethernet also has the advantage of reasonably assured future upgrades. 10 GbE is the logical successor to GbE, with bandwidth at potentially 10 times GbE. Latency for GbE and 10 GbE will be similar, although future improvements in Ethernet implementations are expected to bring this latency down to 10 microseconds (usec) for general server implementations. 10 GbE is relatively new to the marketplace and currently is mostly used as a switch-switch backbone in GbE networks. Right now, it's expensive to implement as a server-to-server network, and its primary reliance on TCP/IP as a communications protocol produces a high CPU utilization per transmitted byte, which may not be acceptable in many environments until further improvements are made in the implementation of the protocol.

All parts of the cluster must work in concert to do the work as fast as possible. Given a finite amount of money, optimizing the performance per dollar calls for some carefully comsidered tradeoffs of communication, compute and storage resources.

Andrew Wack is a System p Cluster Test Architect. Andrew can be reached at

Robert Davis is involved in HPC Competitive Analysis for IBM. Robert can be reached at

comments powered by Disqus



2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
IBMi News Sign Up Today! Past News Letters