INFOTEL, The web-to-database company

Search: Print Site map Contact French version English version American version

News Letter Reorganizing DB2 databases: why, how and when?

Reorganizing DB2 databases: why, how and when?

Databases have become the universal means of storing information that is accessed in real time.

In a perfect world, databases should function all the time without needing any major maintenance such as a reorganization. However maintenance remains necessary in order to eliminate performance degradation, even though it interrupts online services.

Why reorganize?

The purpose of reorganizing is twofold:

  • related data should be close together;
  • free space should be evenly distributed.

DB2 databases lose the above properties as they are updated because data is logically ordered, as opposed to randomly ordered, and as a result of the uneven distribution of updates and the lack of a local reorganization process such as a CA split in a VSAM KSDS.

The reasons for these apparent shortcomings are in the nature of SQL’s logic (ordered) and concurrent accesses to the data by many users, in short the very essence of DB2.

A well organized tablespace is one in which, when following the primary key sequence, page re-reads are not needed and normal page sequence is not substantially altered, and in which there is enough space available locally that the insertion of a new row will not change the sequence of pages.

A well organized index is one in which, when following the key sequence for leaf pages, no major jump occurs, in which leaf pages are as close as possible to the non leaf pages that point to them, and in which there is enough space available locally that a leaf page split would keep the new page close to the non leaf page that points to it.

As a tablespace or an index gets filled, local free space is used up and new data is scattered farther and farther away.

Because the space maps that indicate whether free space is available are usually in the DB2 buffers, there is no major extra cost for inserting new rows in a full area of the tablespace. Also, a direct request via a unique key will not be adversely affected by the disorganization of the tablespace proper. However, a request for a list of rows between two values of the primary key may end up requiring a read for each retrieved row.

A disorganized index will cause an extra arm motion for each direct request to a row, and because of the buffering of DB2, the reads to the two last index levels may be the only reads to the index. This may almost double the I/O time on the index and increase the overall I/O time for the request by close to 50%. Complex DB2 requests that generate a comparison of index entries are similarly affected.

Assuming the tablespace and its indexes have been reorganized, the life of the tablespace goes as follows:

  1. Initial grace period: the freespace is being used, performance is good;
  2. The average duration of every SQL operation increases in proportion to the number of added rows and keys;
  3. Contention develops as new rows and keys are created in the same area;
  4. The duration of SQL operations selecting a number of rows increases to a point where it is no longer acceptable.

Conditions 3 and 4 are clearly undesirable, but depending on the application they may never occur. If they don’t, one can possibly live with a disorganized tablespace where the added cost of an access will not exceed 50%.

How to reorganize?

A tablespace that does not use sequential prefetch and that contains free space, has little need for reorganization of anything but its indexes, which will usually require less overall time and can be split into several batch windows if necessary.

Reorganizing a tablespace will also reorganize its indexes.

Version 5 of DB2 has an online reorganization utility which reads the existing tablespace or index, builds a reorganized copy, applies to the copy all the updates that have occurred on the original object during the process, then switches to the new copy.

Because of the complexity of this process and the extensive use of the logging mechanism, the online reorganization uses far more DB2 and non DB2 resources than a standard reorganization. Therefore its use should be limited to large objects (usually tablespaces) that cannot be reorganized within the largest batch window.

If the batch window is too small to accommodate all the objects, here are some ways to make it roomier:

  • Define enough free pages via the freepage parameter especially for indexes. This creates an initial grace period, thereby less reorganizations.
  • Use a fast(er) reorganization utility as more jobs will fit in the same batch window.
  • Some vendor utilities allow read only access during most of the reorganization process, extending the window to a read only time frame.
  • Some are able to detect hot spots and concentrate freespace in the vicinity of those areas, further extending the grace period so less reorganizations are needed.

When to reorganize?

Not reorganizing an object has a (daily) cost, that of the extra I/O time and buffer space it requires from DB2, without mentioning the negative impact on response times and the extra disk space that may be used.

Reorganizing an object also has a cost, that of the CPU and I/O resources the utility uses plus, for offline reorganizations, the use of the valuable batch window that must be shared with others and for online reorganizations, the use of the same DB2 resources (I/O, buffers, log) that are shared with online transactions.

The strategy is therefore based on cost analysis, and can be summed up as:

Reorganize each object with the frequency that will keep the overall use of the system to a minimum.

A good strategy may increase the throughput of DB2 (and related tasks) by 50%.

Offline strategy

The cost of not reorganizing a given object is proportional to its disorganization ratio (D).

The disorganization ratio is essentially the proportion of rows or keys far from their preferred location. Adjusting the time scale if necessary to have a constant activity on the database, it can be written as a linear function of time. It can be measured by Runstats or by sampling the contents of the object.

The cost of an offline reorganization can be estimated in terms of its most valuable resource: the batch window. Therefore this cost can be expressed in terms of elapsed time, knowing that how many reorganizations can run in parallel depends on the power of the machine.

The elapsed time of the utility is essentially proportional to the size of the object and to a lesser degree, it depends on its disorganization ratio.

When a batch window is opened one needs only compute the gain G for each object, that is the next day’s cost of not reorganizing, and the cost C, that is the elapsed time of the reorganization.

The strategy consists of maximizing the gain within each batch window. That can be accomplished by reorganizing objects in decreasing G/C sequence, except for large ones where some more sophistication is required.

Online strategy

The questions are:

  • For each object that could not be reorganized offline, when to reorganize it online?
  • At what level of disorganization should a particular object be reorganized?

One can show that the average daily cost in terms of DB2 resources of not reorganizing an object over a period of time P is proportional to the disorganization ratio (D) at the end of P, itself approximately a linear function of P:

C1 = a.P - b

The cost of reorganizing online as far as DB2 resources are concerned depends essentially on the object and the average rate of updates (U) during the utility run. The daily cost of reorganizing at the expiration of an interval of P days can be written as:

C2 = E(U) / P

For a given U the value of P that minimizes the overall cost C1 + C2 is:

PU = sqrt(E(U)) / sqrt(a)

The strategy consists of analyzing DB2 activity to evaluate U. Then compute PU. If the elapsed time since the last reorganization exceeds PU, reorganize the object.

Global strategy

A global strategy must take into account not only what type of reorganization is more suitable for each object, but also what functions are best performed within the batch window.

For instance, is it best to reorganize a given tablespace or to perform an image copy of another? Or is it worth performing an image copy of a tablespace when it is almost due for a reorganization that will require an image copy anyway?

The general idea is to assign a gain / cost ratio to each action that can be performed within the batch window on the basis of a fast analysis of all the involved objects (without executing Runstats: too slow) and to determine which combination of actions provides the highest gain.

The choice of objects cannot be made and just submitted to MVS or OPC without regard for a changing environment because unexpected delays, slower run times, or other problems make it impossible to run a planned group of utilities without monitoring. This monitoring must be done to insure success in the task and to prevent the task from taking more time than is available.

All the objects that could not be reorganized offline and may stand to gain from an online reorganization in the proper circumstances are kept in a list and they are reorganized online when the environment is suitable.

Conclusion

While the basic strategy for the best use of utilities in the DB2 environment may appear simple enough, the data to produce cost and gain computations may be difficult to obtain. Some values are within the contents of the DB2 objects and must be extracted quickly, others relate to the application profile and must be extracted over time, others have to do with the performance of various utilities and have to be derived from benchmarks.

Infotel has been working on this problem for some years and has developed not only the mathematical tools to optimize the use of utilities, but also the methods and tools to efficiently extract all the required data from the DB2 objects and applications.

The results of this research have been integrated into a fully automated utility monitor called MASTER-UTIL.




© Infotel 2008. Last update January 2, 2008
http://www.infotelcorp.com - Phone : + (1)727 343 5958