Keeping IBM Machine Learning on z/OS Next to the Data Enhances Analytics
Take advantage of the proximity of the data for faster processing with IBM Machine Learning for z/OS on the mainframe.
By Jim Utsler07/05/2017
Machine learning isn’t some sort of sci-fi-y attempt to make mechanical overlords. Rather, it’s a method by which users can train computers to recognize and even predict patterns in data and, based on machine-learning models, help them make better decisions based on historical, recent and up-to-the-minute information—no matter the format.
Machine learning represents an acute shift from traditional data-harvesting models, with dynamic data modeling and cognitive processing taking the place of the data warehousing, a comparatively clunky way to derive value from data. This is especially true now that data has become a continuous stream of vital information that, if modeled properly, can lead to previously undiscovered insight. And this is what makes the IBM z Systems* environment a perfect machine learning platform.
“You’re taking advantage of the proximity of the data and allowing for faster processing based on IBM mainframe technology,” says Nick Sardino, program director, IBM z Systems Offering Management. “And from a business-value perspective, there are hidden patterns of understanding you may not have visibility to when using other methods. That’s the really the big story when it comes to machine learning on z Systems.”
Some organizations have pushed this type of analytics computing out to the cloud, as it’s relatively easy to use and doesn’t require huge capital outlays. Others interested in machine learning, continuous intelligence and cognitive processing, however, are more interested in an on-site platform that will perform data analysis faster, providing a more robust competitive advantage.
“The z Systems platform allows for the training and retraining of machine learning models, continuous intelligence gathering and cognitive processing in a looped environment that has many, many benefits.”–Nick Sardino
“Last year, IBM announced a cloud-based IBM Watson* Machine Learning environment, and people were rightly very excited about it. There are definitely benefits to it,” Sardino notes. “But others have expressed an interest in having an analytics platform that sits right next to their data.”
And that’s the z Systems platform and IBM z* Analytics, along with a number of related machine learning and analytics tools. The mainframe is able to perform complex machine learning and cognitive processing across multiple data sources more quickly and efficiently than other platforms.
“Some of the more sophisticated analytical operations benefit from z Systems processor technology including Java* enhancements that benefit the analytics engine, the IBM z/OS* Platform for Apache Spark, as well as the large amount of supported memory, increased I/O bandwidth and a very rich caching structure,” Sardino says. “This is especially true of the IBM z13*.”
And unlike cloud resources, an on-premises mainframe essentially butts against critical data-storage resources. This helps allay concerns some may have regarding machine learning in the cloud, including how to transfer data, how to ensure cloud-based data is current, how secure the data is and how to avoid potential network latency issues that may hamper model retraining.
“By not moving data off the mainframe, you’re actually preserving security and governance of the data. When you’re not making, moving and sharing copies of it, you can prevent breaches or compromised data,” Sardino says. “It’s also about the value of the data on the mainframe, the value of running analytics on that data and the proximity to up-to-the-minute transactional data.
“The proximity to the data benefits users in a couple of ways: the ability to call a model-based scoring routine from the transaction itself and still meet tight service-level agreements, and the ability to retrain on incoming data when the model accuracy falls below the desired threshold.”
A Holistic Approach
This is particularly desirable if most of an organization’s data runs through a z Systems server environment. Because everything’s on one platform, users can combine DB2* for z/OS and non-relational data and conduct continuous data processing to gather cognitive insights within their own operational environment. These users can also take advantage of technology such as IBM DB2 Analytics Accelerator, which reduces latency from both data-in-place and incoming transactional data sources.
“The purpose of DB2 Analytics Accelerator for z/OS is to transparently speed up complex queries that may previously have run for days. You can now get query results back 2,000 or more times faster (ibm.co/2qIs3sj),” Sardino says. “Additionally, this tool isn’t limited to data residing in DB2 for z/OS. Many organizations have data spread across multiple databases, such as Oracle, IMS* and MongoDB—and even unstructured databases—which can be combined, with Spark support for z/OS, to turn the mainframe into a holistic machine-learning environment.”
With DB2 for z/OS acting as the foundation, IBM Machine Learning for z/OS and IBM DB2 Analytics Accelerator create an agile, secure mainframe-based continuous intelligence, cognitive processing environment that benefits data scientists, especially when coupled with other IBM z Analytics solutions such as the IBM DB2 Query Management Facility for z/OS and the IBM DB2 Analytics Accelerator Loader for z/OS.
Additionally, machine learning models are accessed by applications on and off platforms via RESTful APIs, which makes it easy for application developers to access model predictions. This aligns with IBM’s cloud-integration strategy on the platform, further allowing organizations to leverage and extend the core business assets and data on the mainframe to the whole organization and easily integrate cognitive modeling into new and existing business applications.
The true enabler, however, is the z Systems server itself. From a hardware perspective, IBM continues to drive up single-thread performance on every generation of the system, giving the system the fastest microprocessor in the market, and its robust caching structure speeds up access to data (see “I/O Improvements for Machine Learning”). Memory has been more than tripled on the z13 to 10 TB in comparison to the IBM zEnterprise* EC12. This allows for the use of more memory solely for cognitive processes, and I/O bandwidth has been improved to allow for quicker storage read/writes.
Clients who are using or considering an on-premises z Systems mainframe for machine learning are doing so for speed. This includes system training and receiving continuous intelligence and modeling updates.
As Sardino notes, “Data in the processor cache is an order of magnitude faster than data in main memory, and data in main memory is an order of magnitude faster than flash. Flash is an order of magnitude faster than spinning disk and spinning disk is an order of magnitude faster than tape. So you always want to keep as much of the data as possible as close to the processor as possible.”
Machine learning represents in many ways a new computing paradigm: a data-centric approach to discovering new insight with little human intervention. Whether it takes place in the cloud or on premises, it provides many benefits.
Running machine learning in a local z Systems environment with IBM z Analytics, data capture, modeling and cognitive processing leads to faster results than running it remotely. Additionally, data can be captured across database platforms and melded into a single view of the truth that can then lead to improved decision-making and, ultimately, increased competitiveness.
“For every client, for different workloads or applications, different models will be needed, and as the models become increasingly integrated into operations and grow, you need systems that can scale to accommodate for this,” Sardino notes. “At the same time, you also need to make data scientists more productive, because they’re going to be responsible for building and maintaining more models to make better business predictions. The z Systems platform allows for the training and retraining of machine learning models, continuous intelligence gathering and cognitive processing in a looped environment that has many, many benefits.”
- IBM z Systems is well suited for machine learning, as it’s located as near to the data being analyzed as possible. With the z Systems information and machine learning modeling capabilities, new information can be discovered and acted upon.
- The mainframe is able to perform complex machine learning and cognitive processing across multiple data sources more quickly and efficiently than other platforms.
- Because information is all in one place, processing can be done quicker and more securely, leading to faster results and improved decision-making.
I/O Improvements for Machine Learning
On the IBM z13* system, data proximity has been bumped up across all cache levels. For example, L1 and L2 cache is per core and there are eight cores on a central processor (CP) chip. Technical specs include:
96 KB I-cache
128 KB D-cache
2M+2 MB eDRAM split private L2 cache
L3 cache is per CP chip, and there are six CP chips in a drawer. Its specs include:
On-chip 64 MB eDRAM L3 Cache
Shared by all cores
L4 cache is per system controller (SC) chip, and there are two SC chips in a drawer. Specs include:
eDRAM Shared L4 Cache
480 MB per SC chip (Non-inclusive)–224 MB L3 NIC Directory
2SCs = 960MB L4 per z13 drawer
Jim Utsler, IBM Systems magazine senior writer, has been writing for IBM since the mid-1990s.
Sponsored Content3 Unknown Risks in Your Resiliency Armor
Post a Comment
Note: Comments are moderated and will not appear until approvedcomments powered by Disqus