Statistics Feedback in DB2 11 for z/OS
One of the more popular topics—if not the most popular topic—in DB2 seems to be query performance. Careers have been built on the single task of improving query performance. How could anyone not be concerned with improving query performance and reliability, reducing a query’s elapsed time while also reducing its CPU consumption?
Query performance seems to pop up in conversations dealing with OLTP to batch to data warehousing. Regardless of where a query might run, a single poorly performing query can consume excessive resources, cause delays to other processes, make deadlines a challenge and possibly grind critical business functions to a halt. No one wants to wait for the query to complete. Then, when you assume that query doesn’t run once per day but rather hundreds, thousands or even tens of thousands of times per day, it becomes easy to see how a small query performance problem can get out of hand.
One simple, often recommended, yet frequently overlooked process that can have a significant impact on query performance is the DB2 statistics associated with the objects (e.g., tables, indexes) that a query might need to use (or not have the opportunity to use if the statistics are way off). A straightforward solution to gathering accurate statistics is through the use of the DB2 RUNSTATS utility or the STATISTICS options on the LOAD, REBUILD and REORG. But how do you know if you are gathering the correct statistics or even gathering statistics?
One method is to query the STATSTIME column in the appropriate DB2 catalog table to determine the last time the RUNSTATS utility or the LOAD, REBUILD or REORG utilities with the STATISTICS options were run. If the statistics have been updated by one of the previous methods, the STATSTIME column will reflect the date and time that the update occurred. If statistics have not been updated for an object, STATSTIME will reflect the default value 0001-01-01-00.00.00.000000. Unlike some of the columns containing actual statistics values (e.g. CARDF, COLCARDF, NPAGESF, etc), STATSTIME is never set to –1 when statistics have not been gathered. For partitioned table spaces, if SYSTABLES, SYSINDEXES, or SYSCOLUMNS have default statistics, be sure to check if all partitions have statistics collected, or ensure STATROLL=YES.
Besides the usual suspects (SYSIBM.SYSTABLES, SYSIBM.SYSINDEXES, SYSIBM.SYSTABLESPACE), Figure 1 has a complete list of the 28 DB2 catalog tables that use the STATSTIME column.
There is also a method available to correct your statistics in cases where you are not satisfied with the current access path determined by the optimizer or want to verify what statistics are necessary for the optimizer to have its best chance at making an optimal access path choice. To do so, create the RUNSTATS control cards necessary to gather adequate statistics on a query-by-query basis using the Statistics Advisor function available in the no-charge product IBM Data Studio. (This functionality is also available in the InfoSphere Optim Query Workload Tuner product.)
Two common approaches are that:
- As you develop your SQL, run Statistics Advisor against the SQL to determine what RUNSTATS control cards are necessary to gather the statistics for that particular SQL statement.
- You can use the normal process for gathering statistics, run your SQL statements and address specific SQL statements with performance issues by running Statistics Advisor against those SQL statements.
Before DB2 11 for z/OS* solutions, Optim Query Workload Tuner had workload statistics advisor that works on all currently supported versions of DB2.
Improvements in DB2 11
Enter DB2 11 for z/OS: Instead of being reactive and constantly checking all of your critical objects to see if you are missing statistics that could have a negative impact on your SQL optimization or waiting for a SQL statement to misbehave, you can let the optimizer validate your statistics. At BIND time for static and PREPARE for dynamic, the DB2 optimizer can now verify that statistics exist and that those statistics are valid (not conflicting) for the objects used by that SQL statement. You don’t have to do a thing to take advantage of this new DB2 11 functionality—you get it all by default. Of course, like most things in DB2, there are a few spots that allow you to put your own twist on how this all works for your DB2 installation.
Let’s first discuss what you get and then we’ll take a look at what’s available to tailor feedback collection to your installation.
The DB2 catalog table SYSIBM.SYSSTATFEEDBACK has been added in DB2 11 for z/OS, creating a mechanism to verify if statistics were collected and, if collected, were the correct statistics collected. Just about everything you need to know about the accuracy of your statistics is contained in one place, a place that can be easily checked via a simple SQL query.
SYSIBM.SYSSTATFEEDBACK contains descriptive information about your installation defined user tables, indexes and columns along with identifying the database and table space associated to the tables and indexes, including when a row last updated.
For the columns identified, SYSIBM.SYSSTATFEEDBACK contains whether a single column of multiple columns were used. When NUMCOLUMNS=1, COLNAME specifies the column that could be used to run RUNSTATS with the COLUMN (colname) keyword. For the instances where NUMCOLUMNS > 1, COLGROUPCOLNO defines the set of columns associated with the recommended statistics. This is analogous to the COLGROUP keyword in RUNSTATS. A new row is created in SYSIBM.SYSSTATFEEDBACK for each different table, index or column the optimizer examines. A sample output, using SPUFI, of the catalog table SYSIBM.SYSSTATFEEDBACK, along with the very simple SQL statement I used to create that output, is seen in Figure 2.
The two columns of most interest are TYPE and REASON. The column TYPE identifies what type of statistics should be collected for the objects identified on that row and the column REASON is the why the statistic was identified for the objects on that row. Together they tell you the how and why to correct the statistics issue.
For a detailed description of TYPE and REASON, how they should be interpreted and how the issues can be resolved, read chapter 35 of DB2 11 Managing Performance (SC19-4060) or at the IBM Knowledge Center in the section titled “Identifying missing or conflicting statistics.”
Remember, SYSIBM.SYSSTATFEEDBACK only contains rows for objects with missing or inconsistent statistics. This table is also self-cleaning. If an issue that is noted in SYSIBM.SYSSTATFEEDBACK is resolved (for example, you run RUNSTATS to gather the missing stats), the row describing that issue is removed from SYSIBM.SYSSTATFEEDBACK. Any rows left in SYSIBM.SYSSTATFEEDBACK are rows representing objects that still have existing issues.
There are five possible values that could be reflected in the TYPE column (see Figure 3) and nine values that could show up in the REASON column (see Figure 4).
You can get similar information to the above from the EXPLAIN table DSN_STAT_FEEDBACK. All of the columns mentioned here are available in the EXPLAIN table DSN_STAT_FEEDBACK. For example, REASON and TYPE carry the exact same meanings in DSN_STAT_FEEDBACK as they do in SYSIBM.SYSSTATFEEDBACK.
The DSN_STAT_FEEDBACK EXPLAIN table is updated only by EXPLAIN and is never modified by RUNSTATS, so you have to clean it yourself. The big difference between DB2 adding rows to DSN_STAT_FEEDBACK and adding rows to SYSIBM.SYSSTATFEEDBACK is that the EXPLAIN process cannot be disabled, although one could choose to not create the DSN_STAT_FEEDBACK explain table, thus preventing the information from being written. The subsystem parameter STATFDBK_SCOPE, described in the following paragraph, has no affect on EXPLAIN’s processing.
The DSN_STAT_FEEDBACK table also contains the standard EXPLAIN columns QUERYNO, APPLNAME, PROGNAME, COLLID, GROUP_MEMBER and EXPLAIN_TIME, enabling the statistics feedback information to be associated to the other EXPLAIN tables for a total view of a SQL statements potential access path and performance inhibitors. Unlike the DB2 catalog table SYSSTATFEEDBACK, the EXPLAIN table DSN_STAT_FEEDBACK allows the statistics recommendation to be tied to a particular SQL statement.
Making the Feedback Available
How the feedback is gathered and makes it to its DB2 catalog table is pretty cool. The optimizer is responsible for determining if statistics are present and, if so, that they make sense. If there are no stats for the optimizer to use or the statistics that are listed have no conflicts, that information is written to memory. The subsystem parameter STATFDBK_SCOPE determines the scope of the optimizer’s statistics validation or whether validation is to take place at all. The subsystem parameter STATSINT on the DSN6SPRM macro—REAL TIME STATS field on the DSNTIPO installation panel—is the internal used to externalize the in-memory feedback to the DB2 catalog table SYSIBM.SYSSTATFEEDBACK. If the feedback is needed between intervals, there’s even a DB2 command that can be issued.
To tie this all into my real job, how does this all affect the IBM DB2 Analytics Accelerator for z/OS? Any query that goes through bind/prepare could result in feedback. This would include queries that end up getting routed to the DB2 Accelerator. The recommended statistics in those cases can help ensure proper differentiation between long-running queries that should be off-loaded and short running queries that should still run on DB2 for z/OS.
Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.
comments powered by