MAINFRAME > Administrator > IMS

Replicate Data Easily With InfoSphere Classic Change Data Capture for z/OS


The complexity of today’s information infrastructures is staggering. The number of data sources, the volume of data in them, the platforms on which they run and, even more importantly, the rate at which all of these elements is changing can stress even the best-managed information environments. Information integration and big data capabilities have grown from nescient collections of single function tools into robust, integrated platforms that empower organizations to address these continuously evolving environments.

Consider, however, that each new application brings not only its own data, but also new ways to use existing information in the enterprise. The challenges that we face are not just about bringing new sources of information into our decision-making and business operations. It is also about finding better ways to leverage what we have. How can we bring the data managed by the System z applications that run our business into today's world of business analytics and big data?

Fundamentally, there are two approaches and each brings value. The choice is which approach is best suited to the need at hand, as we know that there is no silver bullet, one-size-fits-all way to integrate our information with the business.

Access in Place

One approach is to leverage the information where it exists. For a relational database like DB2 for z/OS, this is a matter of using SQL and, as technology evolves, we can anticipate additional non-SQL techniques to be supported.

For non-relational z/OS data, we turn to data virtualization solutions. These enable the same standard SQL and ultimately standardized non-SQL data access techniques to integrate the IMS, VSAM and even sequential data sources that abound.

This approach makes assumptions that aren't always appropriate, especially for non-relational data sources. Assumptions include the fact that:

  • The structure of this operational data is suited to the new use of the data. This is not always the case. Non-relational data often needs to be repackaged for new uses.
  • System z resources are available when the user needs the data. Access technologies require System z resources every time a user pulls the data. This may create resource contention during peak operational workload processing.
  • Each use benefits from its own pull of the source data. Individual users and/or applications pulling System z increase overhead. In many situations, one "pull" of the data may be better suited to address requirements, minimizing the impact on the source environment.

Create a Fit-for-Business Copy

A second method is to create a copy of all or some portion of the source data on a platform and in a data environment focused on the business user's needs. The information can be repackaged along the way or can be reconstituted in an environment better suited to the challenge at hand.

Non-relational System z data may be repacked and stored in a relational database (DB2 for z/OS to keep it "on platform," or DB2 or other RDBMS for "off platform" copies) so that it can be integrated directly with SQL-driven initiatives. All System z data, including DB2 for z/OS data, may be reconstituted into an Hadoop Distributed File System model and stored in a Hadoop environment such as IBM InfoSphere BigInsights for use with MapReduce, Hive and other new big data access models. And as the need and methods for using information expand, the targets for System z data replicas will only grow.

The key assumptions here are that:

  • Copying and repackaging will increase the value of the data. Copies are designed to bring the data closer to the end user and by implication increase the usability and therefore the value of the data.
  • The copy can be synchronized with sufficient frequency to ensure that latency requirements are met. This could mean that the copy must be up-to-the-second accurate, up-to-the-day accurate or only up-to-the-week accurate. The key is that the use case dictates the latency, not any restrictions or limitations built in to the technology used to copy the data.
  • The approach is sufficiently flexible to adapt to the changing information environment. While System z transactional data sources change to address evolving operational requirements, the target environments that hold the copies are changing at an increasingly rapid pace. Where an RDBMS copy may have sufficed in the past, Hadoop, No-SQL and other data environments seem to appear with every new publication that you open. Some will stick, some won't, but each organization needs the ability to decide for itself what works.

Data Synchronization Using Replication

Once the decision is made to create and manage a copy of the source data, careful consideration should be given to how to accomplish the initial copying and ongoing synchronization of the replica with the source.

Change Capture

The single most important consideration when it comes to replicating System z data is that capturing data changes is just as important as reading the source data. This will typically reduce the volume of data moved, as once the initial copy is created, only the changed data structures are updated. It also eliminates the dependence on a batch window as changes can be captured as they are made. This dramatically alters the profile of the required bandwidth as changes flow on a continuous basis rather than in one big push. Finally, it also implies no downtime for the source data environment. Applications can continuously update the source data without regard to the change capture processing that is feeding the replica or replicas.

Log-Based Capture

The second critical requirement is that the changes be captured from a log rather than an application interrupt or exit. This isolates the capture from the applications that are updating the source data.

There are a number of benefits to this approach, including:

  • Eliminating hand coding for detecting data changes
  • Removing data event capture overhead from the transaction path
  • Providing a single integration point for events initiated by multiple applications
  • Making data integration independent of the structure or flow of applications

Recoverability

We all know that things happen. Data replication must be able to deal with all of the foibles of our technology environments as well as the planned outages that occur regularly and include:

  • Communication breakdown between the source and the target environments
  • Unavailable target data environment(s)
  • Changed data volumes outstrip our infrastructure capacity
  • Planned outages

The key here is that the source and the target environments shake hands and maintain restart points for when things go wrong. Data may get delayed but it should never get lost.

Karen Durward is an IBM InfoSphere software product manager specializing in System z data integration.



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.


comments powered by Disqus

Advertisement

Advertisement

2017 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Bringing IMS and SOA Together With IMS Connect

New requirements for IMS Connect functionality could make implementing an SOA environment with IMS easier and more flexible.

Celebrating 40 Successful Years

IMS version 10 supports synchronous and asynchronous callout

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
Mainframe News Sign Up Today! Past News Letters