Bookmark and Share

Recent Posts

The Evolution of Compression: Db2 9

November 28, 2017

Last week I discussed many of the data compression benefits delivered with Db2 8. That release included perhaps the most significant development in data compression in the past decade: since Version 8, Db2 has handled data compression using hardware, not software.

Db2 9 NFM introduced index compression, which further lowered costs and improved scalability. Index compression didn't require a compression dictionary; this meant that newly created indexes could compress their contents immediately. Compressed index pages were stored on disk in their compressed format (physical 4 KB index page on disk) and expanded when read from disk into 8 KB, 16 KB or 32 KB pages. To activate compression for an index, it needed to be defined in an 8 KB, 16 KB or 32 KB buffer pool.

Index compression was activated by using the COMPRESS YES keyword with either the CREATE INDEX or ALTER INDEX statement. Issuing ALTER INDEX COMPRESS YES placed the index in REBUILD pending status.

In contrast to data (table) compression, index compression occurred for the entire index space, so it couldn't be activated at the index partition level.

Before altering an index with COMPRESS YES, DSN1COMP could be used to estimate the benefit of compressing the index. Another reason to use DSN1COMP was to determine what buffer pool size the index should be allocated to. The data was stored compressed on disk in 4K pages, and during GETPAGE processing, the data was uncompressed into an 8K, 16K or 32K buffer pool.

DBAs needed to recognize that, when using the COPY utility, the data would be stored in an uncompressed format. Thus, the output data set would exceed the size of the source index.

Db2 9 continued to bring greater efficiency to compression. Data compression (table) used the Ziv-Lempel compression algorithm, which required a dictionary. A Ziv-Lempel hardware instruction called CMPSC provided improved compression performance.

Because compressed data was read into a Db2 buffer pool in compressed form, Db2 could do I/O directly in and out of the buffer pool. When a row of data was fetched from a page, it was decompressed then passed to the application. The CPU cost of data compression wasn't significant for typical online transaction processing (OLTP) work, which randomly fetches one row for each GETPAGE request. However, the cost of data compression for sequential applications (for example, table scans, sequential inserts and utilities) could be significant. Therefore it was recommended to benchmark critical applications with any increase in CPU over the savings in disk storage.

Because data compression requires a dictionary, inserted rows cannot be compressed until the dictionary is built for a table (or table partition). Only the Load and Reorg utilities were capable of building the compression dictionary, and this led to some overhead. To minimize it, the utilities provided the KEEPDICTIONARY keyword, which allowed reuse of an old dictionary. Of course, if the nature of the data had changed, reusing the old dictionary could result in sub-optimal compression.

Again, in contrast to data compression, index compression didn't use a dictionary. Another advantage of this is that it allowed Db2 to compress newly inserted index keys immediately without waiting for LOAD or REORG to be run. In fact, many applications would use SQL to load a database rather than LOAD. In addition, with index compression, the REORG INDEX utility never had to be run for building a dictionary.

An additional area for compression is the Db2 logs. Prior to Db2 9 for z/OS, all disk archive log reads were done using basic direct access method (BDAM). Although BDAM allowed for rapid access to the target CI, it did not provide support for striped data sets or DFSMS DASD compression (extended format data sets). Because active logs supported extended format data sets and could be striped, it took longer to write an archive log copy than the active log that the archive data is coming from. If this situation continued on a busy system for a sufficient period of time, logging could be suspended while waiting for the archive log processing to catch up.

With Db2 9 for z/OS NFM, the archive log data sets could be defined as extended format data sets, allowing DASD archive logs to be striped and compressed. With striped archive logs, Db2 could now match the performance of striped active logs, thus ensuring that as logging volumes increase, log archiving could scale at the same rate.

Some good resources on Db2 9 for z/OS:
Next week, I’ll continue this recap of the compression evolution with a look at Db2 Version 10.

Posted November 28, 2017 | Permalink

comments powered by Disqus