POWER > Trends > What's New

How Summit Became the World’s No. 1 Supercomputer

Summit Supercomputer
The U.S. Department of Energy’s Oak Ridge National Laboratory unveiled Summit as the world’s most powerful and smartest scientific supercomputer on June 8. Photo: Oak ridge national laboratory and Carlos Jones

In 2014, the Collaboration of Oak Ridge, Argonne and Livermore (CORAL) U.S. national labs was established and the U.S. Department of Energy awarded $325 million for the creation of two top scientific supercomputers. One of those supercomputers, Summit, is in use at Oak Ridge National Laboratory (ORNL) in Tennessee.

It’s a big moment for ORNL. TOP500 named Summit the world’s top supercomputer in June (bit.ly/2LqadSP). Let’s look back at the last four years on the journey to No. 1.

ORNL hoped to drive open scientific study and discovery through a powerful state-of-the-art system. The lab’s 2-year-old supercomputer Titan was second on TOP500 at the time, but it aspired for a system that would be able to better address the most complex questions (bit.ly/2siy2Uz).

Efficient Translation

TOP500 ranks supercomputers based on computations per second while running a FORTRAN application called Linpack. Computations per second can be influenced by physical hardware characteristics, machine language choice, OS choice and the compiler’s ability to optimize.

The compiler’s ability to optimize is important for achieving fast programs because compilers translate a program written in a “reads-like-English” programming language (e.g. FORTRAN, C++) into a machine language that computers can understand (see Figure 1).

Compilers are translators, and just like a translator used for human language (e.g., Chinese to English), compilers decide which words to use in the translation, which in turn determines how efficient the translation is. If a Chinese to English translator tries to translate the Chinese phrase "他效率很高”(“He is efficient”) but lacks a good grasp of English and instead translates it as “He has a high level of efficiency,” the translated sentence is understandable and correct, but longer than necessary. Translators must have a good grasp of the source and target language to be efficient. The same thing is true for compilers. For example, they need to have a good grasp of C++ and the PowerPC* machine language to create fast programs.

A Superior Compiler

You might not think that efficient translations from the compiler would be a big contributing factor in Summit’s No. 1 ranking as the top supercomputer, but one “translation” (using IBM’s XL compiler) of the CORAL LULESH benchmark is 3x faster than another translation1. In the better translated version, the innovative hardware in the Summit system was utilized to its fullest potential, taking advantage of both the CPUs and the GPUs (see Figure 2).

Making use of GPU, or “offloading” to the GPU, isn’t a trivial task for any compiler writer, and IBM had to teach XL two new high-level languages, CUDA and OpenMP, as well as new machine language vocabulary for the POWER9* CPU and Nvidia GPU. C, C++, and Fortran applications running on Summit can offload computation to the GPU through directives in the code written in OpenMP or CUDA. The directives are understood by the compiler to allow the offloading to take place.

The Summit system makes compilers from several vendors available, but IBM’s XL C/C++ and XL FORTRAN compilers are so well regarded for their speed and reliability that they’re the only vendor compilers loaded by default (bit.ly/2nArBdi).

The ORNL team in Tennessee has been working closely with the IBM XL compilers team in Toronto and Shanghai during the last four years in a continuous cycle of giving feedback, hearing proposed solutions, receiving new compiler features in incremental deliveries (more than 25 compiler beta refreshes in total) and testing. At project milestones, the two teams met in readiness workshops to ensure the functional and performance targets were on track.

In addition to the compiler team, ORNL has been working with the IBM POWER9* hardware team. Summit is made up of rows and rows of server cabinets—over 4,000 servers in total. Under the hood of each cabinet, you’ll find two IBM POWER9 CPUs and six NVIDIA Volta GPUs, interconnected by high-speed links (NVLink). The bottleneck of offloading computation from the CPU to GPU is relieved by NVLink.

Reaching the Summit

It’s been a long journey, but Summit made it to No. 1 and now new scientific discoveries await! John Kelly, senior vice president, Cognitive Solutions and IBM Research, remarks, “Summit can run simulations and models that will help us advance cancer research, understand genetic factors that contribute to opioid addiction, discover stronger, more energy-efficient materials and better understand supernovas to explore the origins of the universe.”

1. All performance measurements were run on a POWER9 AC922, which contains one POWER9 CPU and four Volta NVIDIA GPUs. Compares compiling an unmodified version of LULESH with compiler options -Ofast -qarch=pwr9 -qtune=pwr9 -qsmp=omp (CPU-only, parallelized) to a modified version of LULESH (more OpenMP pragmas added to indicate which parts of the code to offload to the GPU) with compiler options -Ofast -qarch=pwr9 -qtune=pwr9 -qsmp=omp -qoffload (CPU and GPU). LULESH›s specifications indicate to use FOM (Z/S) as the unit for reporting performance, where larger is better. CPU-only measurement: 6565.0961 Z/S, CPU&GPU: 19273.966 Z/S, 2.94X faster.

Si Yuan Zhang is the offering manager for XL Power Compilers, IBM China Systems Lab.

comments powered by Disqus



2018 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Technology’s Gifts

Seven charities that innovate for good

Anonymization’s Murky Waters

Data experts aim to balance privacy risk, research potential

IBM’s Greenest-Yet Data Center

IBM’s LEED gold data center

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store