Skip to main content

The Benefits of Analytics and Machine Learning on IBM Z

Enterprises have vast amounts of high-value, sensitive data on IBM Z*. With recent advances in machine learning technology, it’s the right time to extract key insights from this data and monetize it. Given its sensitive nature, it only makes sense to leverage that data in place. Enterprises can also minimize costly data movement and maintain a high level of governance, encryption and resilience in a security-rich environment.

IBM Machine Learning for z/OS* transforms the platform into a cognitive learning system through continuous feedback, which simplifies model retraining. Models can improve as they’re exposed to more data and human intelligence is augmented to help organizations optimize recommendations and decisions. This means that instead of looking into the past for generating reports, businesses can predict what will happen in the future. Predicting the future means personalizing every client interaction, risk reduction, fraud detection, cross sell/upsell, customer categorization, inventory optimization and more, all of which increase enterprise revenue and help obtain an edge over the competition.

Traditional machine learning requires significant development, deployment, management and human intervention. The IBM approach focuses on quick model development, continuous auditing and proactive notification, and easy management. Information is processed securely and in place. This gives clients the ability to drive better business results while identifying and minimizing risks.

Selecting an Analytics and Machine Learning Platform

With strong security, low total cost of ownership (TCO), competitive performance and established governance mechanisms, IBM Z can be an effective keystone in an enterprise analytics solution. More often than not, organizations consider moving data from a system of record to an off-platform analytics environment with the belief that costs will be lower. Deploying analytics and machine learning on IBM Z can lead to the following benefits:

1. Data Gravity
Many organizations maintain vast amounts of high-value, sensitive data on IBM Z. IBM recognizes the tremendous benefits that data gravity can bring to enterprises, including reduced cost, shortened time to value and minimized security exposures, when analytical workloads are moved to where the data resides. As such, it’s a no-brainer that running analytics and machine learning on IBM Z makes the most sense. Data analysts and scientists can easily explore current and trustworthy data on IBM Z in a secure manner.

The same thing can’t be said for off-platform analytics, as data quality problems are often introduced when data is replicated or in motion. This extends to the validity of data when currency is considered.

2. Industry-Leading Security
In an increasingly intricate world of regulatory requirements and external threats, the security of client data and mission-critical workloads is paramount. Not surprisingly, security and compliance are two of the biggest concerns for many organizations today. IBM Z is a highly secure system, and the latest IBM z14* continues to enhance an already robust system with pervasive encryption, taking advantage of features such as the Central Processor Assist for Cryptographic Functions (CPACF) and the Crypto Express 6S cards for FIPS-4 certified encryption key management.

Running analytics and machine learning with data on IBM Z is a safe way to meet today’s stringent compliance and security needs. The same thing can’t be said for distributed platforms, which, by nature, increase the risk of security exposure and information leakage by maintaining multiple copies of data across any number of servers.

3. IBM Z Resiliency
The implications of downtime can be considerable. Planned and unplanned system outages can negatively impact both customer loyalty and an organization’s bottom line. IBM Z provides the highest levels of reliability, availability and security of any server platform on the market, as cited in the recent independent “ITIC 2017 Global Server Hardware and Server OS Reliability Survey,” which polled 750 organizations worldwide (ibm.co/2VaW0kv).

A Cost Comparison

Cost is a major factor when contemplating any IT investment, and analytics is no exception. When deciding where analytical workloads, including machine learning, should be deployed, confusion often occurs about which cost elements should be considered.

In 2015, IBM published a Redbooks publication titled “Reducing Data Movement Costs in z Systems Environments” (ibm.co/2lAcBeC). The publication highlighted the costs associated with a daily extract, transform and load (ETL) of 1 TB of data over a period of four years. The analysis highlighted the high cost of data movement to an off-platform analytics environment due to significant CPU overhead of ETL processes.

As technology never stands still, we wanted to revisit the comparison and conduct our own calculations. As an alternative to ETL, we also wanted to evaluate change data capture (CDC) as the main method for data replication, with ETL only being used for the initial transfer and load of data, and then being scheduled so as not to impact the IBM Z software Monthly License Charge. The numbers and projections used in this analysis are estimates and leverage tools used by the IBM IT Economics Practice.

Our starting point was IBM Open Data Analytics for z/OS. We assumed a medium-sized analytics configuration on an IBM z14 consisting of 10 IBM z Integrated Information Processor specialty engines and 512 GB of memory. Assuming pervasive encryption was enabled on the z14, we deducted an overhead of 2.6 percent before calculating the equivalent number of x86 cores needed for our off-platform analytics environment. We assumed an average 45 percent utilization for the x86 servers, including a 10 percent overhead for x86 platform encryption, which yielded a requirement of 223 cores or five 48-way x86 servers.

However, as IBM Z provides mission-critical reliability by design, we included an additional x86 server (n+1) to account for the failure of a single server in our x86 off-platform analytics cluster. We didn’t assume that the entire x86 cluster would fail, bringing the total x86 server count to six and the core count to 288. For comparison purposes, we assumed a commercial open-source vendor support offering for our off-platform analytics requirement.

To facilitate data replication between IBM Z and our off-platform analytics environment, we assumed that IBM InfoSphere* Data Replication for Db2* for z/OS would be installed on Z and that IBM InfoSphere Data Replication, IBM InfoSphere DataStage and IBM Db2 Enterprise Server Edition would be deployed in an n-tiered server architecture, each on separate x86 24-way servers with Db2 LUW being fully redundant.

Using the metrics published in a whitepaper titled “IBM InfoSphere Data Replication’s Change Data Capture Version 10.2 (DB2 for z/OS) Performance Comparison to Version 6.5,” (ibm.co/2tAFlbt) we estimated that a 2 TB a day transfer, sustained at 0.18Gbps, would result in 133 MIPS usage on Z and require six x86 cores on the target IBM InfoSphere Data Replication server.

CDC interrogates Db2 for z/OS log files to detect changes rather than querying the database directly. As a result, minimal processing impact occurs on the actual database compared to the traditional approach of ETL. IBM InfoSphere Data Replication for z/OS was colocated with a typical IBM Z software stack consisting of CICS*, MQ* and Db2.

A Closer Look: TCO and Security

Given that IBM Z delivers unparalleled security, it was necessary to consider security for our off-platform analytics environment to achieve an accurate comparison.

To best assimilate IBM Z pervasive encryption, which leverages the CPACF, standard on every core, and the new Crypto Express 6S hardware security module (HSM) found on IBM z14, we assumed a commercial off-the-shelf transparent encryption agent offering for each of our x86 servers along with a fully redundant commercial Data Security Manager (DSM) with an embedded HSM.

In addition, to assimilate the RACF and security server components found in z/OS, we assumed a deployment of IBM Security Identity and Access Assurance Enterprise Edition. This required four additional 24-way x86 servers to accommodate various components and was licensed for 400 users accordingly.

A total of 13,694 FTE hours were included for on-platform analytics, as opposed to 33,858 FTE hours for off-platform analytics. Off-platform analytics attracted significantly more labor overhead due to the effort required to architect, install and configure multiple software components across multiple servers. We didn’t include any direct labor for data engineers or data scientists for either case, as we considered that these headcounts would be dictated by the business and, in theory, would be very similar.

As shown in Figure 1 on page 16, our estimated five-year cost for z Analytics is $2.9 million compared to $6.5 million for off-platform analytics. This amounts to a cost avoidance of over $3.5 million, or 124 percent, for z Analytics.

Off-platform analytics attracts significant costs for data replication ($3 million) and security ($1.6 million)—costs that are otherwise not required or would be an integral part of IBM Z.

When evaluating z Analytics against off-platform analytics, it’s important to include the cost of data replication and security in addition to any analytical software. IBM Z is the only platform that offers pervasive encryption and delivers a robust security model that provides access control and auditing functionality built into the OS. A Multiplatform Machine Learning Strategy IBM offers a multiplatform machine learning strategy consisting of IBM Watson* Machine Learning, IBM Data Science Experience and IBM Machine Learning for z/OS. This capability is built on open-source data science frameworks such as Apache Spark, Anaconda, H2O and more. In offering IBM Machine Learning for z/OS and IBM Data Science Experience Local for IBM Cloud* Private, which will soon be available for Linux* (IFLs or LinuxONE*), IBM has eliminated the need to move data off platform. As such, IBM Z simply becomes another node in an organizations’ analytical arsenal, albeit a very capable one.

Running analytics and machine learning on IBM Z has found a niche for organizations that want to maintain high levels of control over their data while exploiting sub-millisecond predictions as part of an established transaction processing system such as CICS.

James M. Roca is the chief optimization and IT strategy leader at IBM IT Economics Practice. 
Srirama Krishnakumar is a senior IT business management consultant for the IBM IT Economics Team.