Bookmark and Share

Recent Posts

Spark Takes Data Analysis to the Next Level

July 19, 2016

I must admit that, until very recently, I'd never heard of Spark. But since it was such a big deal at this spring's IDUG NA conference --  an entire day was devoted to it -- I figured I should take some time and learn about it.

So I attended IBMer George Wang's presentation (E10 - Benefits of Apache Spark on Z Systems), and it was eye-opening. Going in I assumed Spark performed some type of analytics. I quickly learned otherwise. George's slides featured these bullet points to answer the question, "What is Spark?"
  • An Apache Foundation open source project; not a product
  • An in-memory compute engine that works with data; not a data store
  • Enables highly iterative analysis on large volumes of data at scale
  • Unified environment for data scientists, developers and data engineers
  • Radically simplifies the process of developing intelligent apps fueled by data

I chatted with George afterwords, and he told me he'd coauthored an IBM Redpaper that goes into greater detail about Spark. I especially like this paragraph, headed "What is Apache Spark?"

One of the key aspects of Spark that has attracted a growing following of adopters and contributors is its strength as a unification of the programming interfaces for analytics. Spark is not only about data access, it is about the framework that is offered in terms of analytic programming context.

Another thing I found helpful is the user case provided in figure 2-2; that helped me visually get a better idea of the architecture and understanding of how Spark can be used. The use case shows a CICS banking transaction making a RESTful call to a Spark application that uses DB2, IMS and Twitter data to analyze and qualify the candidate for a promotion offer.

Clearly, Spark is not a slow batch processing type analytic system, but a data-analysis tool that works within mission critical applications that host data from multiple systems and even multiple platforms. Sparks helps companies make quick business decisions.

As a DB2 DBA, I really want to understand how data is being accessed and used by the Spark application. This is covered in detail in chapter 4. The statements made in the “DB2 for z/OS” section are critical. I'd sum them up this way:
  • Apache Spark promises to be a “game-changer” for big data by providing a unified analytics platform and is emerging as a de facto “analytics operating system.”
  • Because it is able to support a wide variety of structured and unstructured data sources, Spark is positioned to be the enterprise-wide analytics engine.
  • Big data is not all about unstructured data. In fact, most real-world problems are solved using some form of structured or semi-structured data. Because DB2 for z/OS is the market leader for enterprise-structured data, an integration of Spark with DB2 is an obvious next step in the evolution of big data.
The world on z/OS is changing at lightning speed, and Spark is yet another example of this. If you haven't taken a look at Spark, you should, and soon.

Posted July 19, 2016| Permalink

comments powered by Disqus