POWER > Trends > What's New

Lessons Learned Building the World’s Top Supercomputers

Summit and Sierra
Illustration by Andy Potts

The IBM-built Summit and Sierra supercomputers are, respectively, No. 1 and No. 2 in the world, according to the most recent TOP500 list.

While capturing the two top spots on the TOP500 is a remarkable achievement in and of itself for any provider, even more remarkable is the fact that the same POWER9* technology fueling these U.S. Department of Energy supercomputers for scientific research and discovery can be put to use by enterprises of all sizes to solve business challenges.

“Our POWER9 servers leverage the same technology fueling these tremendous engines of insight,” says Bob Picciano, senior vice president, IBM Cognitive Systems. “This means our clients can take advantage of commercially available, cost-effective and scalable solutions for both mission-critical and emerging workloads like artificial intelligence (AI) and deep learning.”

The power of these supercomputers is already being demonstrated by scientists who tapped into POWER9 processor-based Summit to explore complex computational challenges at unprecedented speeds. The prestigious Gordon Bell Prize, one of the top annual honors in supercomputing, recently went to two teams that tapped into Summit’s powerful capabilities to advance scientific discovery. One team developed a genomics algorithm capable of using mixed-precision math to attain exascale speeds—the fastest science application ever reported. The other Gordon Bell Prize-winning team achieved the fastest deep learning algorithm ever reported and is exploring AI to predict extreme weather patterns from climate simulations.

Powering the Enterprise

Although Summit and Sierra were built with scientific workloads in mind, they’re proof that the features enabled by IBM’s POWER9 processor—and the AI, cloud and security capabilities that come with it—can easily be extended to a broad category of use cases, from finance to manufacturing to research, across industries.

“When we started architecting Summit and Sierra for the Department of Energy, we used the tenets we put into the POWER9 processor- based architecture, including agility in the cloud, accelerators and AI, and very tight security,” Picciano notes. “This isn’t some bespoke architecture consisting of massive amounts of dense CPUs. Summit and Sierra are built out of IBM’s AC922 servers—and the AC stands for ‘accelerated compute,’ by the way. We can scale from a single system to a rack, to a cluster the size of Summit. That means you can solve problems of all sizes and grow as your requirements and workloads grow.”

Picciano emphasizes the “accelerated compute” that’s the foundation of the AC922 was built in partnership with leading OpenPOWER members, including NVIDIA, Mellanox and Red Hat. This new type of design is a marriage of traditional processors, with extensive advancements to assist new types of workloads, coupled with specialized silicon, most commonly in the form of GPUs. What’s more, with over 400 PB of storage running on more than 125 racks of IBM Elastic Storage Servers, Summit and Sierra maintain a continuous flow of data, so that complex AI workloads can run at peak performance, with fast access to all the data required. These advanced systems are designed, in fact, to accelerate the type of math and information processing required to power new AI algorithms.

Purpose-Built Infrastructure

Efficiencies are found when resources are shared between the CPU and the GPU, a capability that has been architected into POWER9. This is the sweet spot when running open-industry machine learning frameworks like TensorFlow, PyTorch, Caffe or Keras.

“We’re the only system with a unique—and exclusive—capability that exists within AC922 nodes called NVLink that couples CPUs to NVIDIA’s GPUs,” says Picciano. Without NVLink, systems are handicapped when communicating between the accelerator and the rest of the computer. He continues: “NVLink and OpenCAPI accelerate performance bandwidth up to 10x faster than PCI-Express 3.0 that is available in x86 systems, and we are also the first to offer the latest PCIe Gen4 I/O—capabilities that double the bandwidth versus POWER8* processors.” NVLink 2.0 offers 150 GBps between the POWER9 chip and each GPU, and a PCIe Gen3 x16 slot can only perform at 16 GBps.

“When we started architecting Summit and Sierra, ... we used the tenets we put into the POWER9 processor-based architecture."
—Bob Picciano, senior vice president, IBM Cognitive Systems

As companies start their journey to AI, this purpose-built infrastructure is important, because accelerated compute is a key to tackling AI workloads. “When we talk about ‘purpose-built infrastructure,’ we are referring to enterprise-class systems that are built, from the ground up, to handle data-intensive workloads better than any other systems,” he notes.

The Insight Economy

Because IT value now resides in the ease and speed by which users can derive insight out of data—what Picciano calls the era of “the insight economy,” purpose-built infrastructure can help advance the breakthrough capabilities such as machine learning and deep learning that are increasingly creating tangible business value for clients across industries.

“We already have clients, from financial institutions to oil and gas companies, who are tapping into the POWER9 AC922 servers and PowerAI software to accelerate deep learning training times and explore problem-solving for previously unsolvable problems, from preventing financial fraud to detecting potential oil pipeline leaks before they happen,” he says.

PowerAI is an enterprise software distribution of enhanced open-source AI frameworks that enable faster training times to improve data scientist productivity and make AI adoption easier. “In our POWER9 systems, hardware and software are combined for optimal performance, and clients are just beginning to reap the full benefits,” Picciano says.

One important benefit is scalability. Clients whose businesses are transforming can start from as small as one node and scale up to thousands of nodes. Having software that is easy to use and consumable by a wide range of business users is another important benefit of these POWER9 systems. For example, business analysts can tap into image and video with a simple point-and-click interface—using PowerAI Vision software running on POWER9—to train a deep learning model. These systems are helping to deliver faster insights and scale those across the enterprise. “We are essentially bringing deep learning to the masses,” says Picciano.

Building a Cloud-Ready Data Center

With last year’s rollout of the entire family of POWER9 processor-based scale-up and scale-out servers, clients can gain the benefits of POWER9 performance across a range of OSes and workloads.

“Consolidation opportunities exist in our POWER9 scale-up servers such as the E980, because these systems now pack up to 192 cores and 64 terabytes of memory, in addition to the ability to host up to 1,000 virtual machines. And no less important, they can run battle-tested AIX*, Linux* and IBM i workloads all within the same environment—and with the security of workload isolation,” Picciano remarks. “That’s just breathtaking when you think about the impact on the efficiency of labor and application and security management.”

This type of purpose-built infrastructure was designed not only to crush AI workloads, but also to facilitate cloud deployments. In fact, the POWER9 architecture delivers the agility, scalability, performance and efficiency needed for nearly every cloud workload—whether that’s private, hybrid or public.

In addition to the numerous performance enhancements for the POWER9 processor, security has been improved—with double the number of crypto engines (24 on POWER9 versus 12 on POWER8*) embedded on the chip. Crypto engines accelerate applications that need cryptographic functions. Executing these functions in hardware reduces software overhead. As a result, actions that are required for both cloud-native and enterprise workloads such as encryption, decryption and authentication can execute more quickly.

New Roads for Innovation

Picciano offers this advice for clients who are considering migrating to POWER9 processor-based servers: “Overall, I recommend thinking more about insights, and less about specific workloads,” he says. “Focus on the problems you need to solve—whether it’s about customer loyalty, or new opportunities, or longstanding threats. What we’ve learned from building the world’s top supercomputers is that the right infrastructure choices, in the end, can help you find answers to your most pressing business challenges.”

For Picciano, taking that approach advances organizations well down the road to putting IT in the driver’s seat to maximize business opportunity and minimize risk, today and in the future “At IBM, we call it ‘putting smart to work’ to make your data center cutting edge. And in today’s dynamic marketplace, that’s the key to innovation and competitive advantage.”

comments powered by Disqus



2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.


IBM Systems Magazine Launches Android Apps

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store