2009/12/02

Data generator

Yesterday I had fallen to new issue with bad performance of "reusable code". It was some piece of code seemed to be very elegant from programmer point of view.
Complicated select statement contained the some kind of "data generator" as inline view like following:

select add_months(:p_begin_date, level - 1) dt
from dual
connect by level < months_between(:p_end_date, :p_begin_date) + 2

This subquery aimed to generate a list of month between two dates which are specified as bind variables.
As a result, the complex query was failed with "ORA-01652: unable to extend temp segment by string in tablespace string" approximately at 13 step of plan (see below)

--------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------------------
|* 1 | FILTER | | 1 | | 0 |00:00:00.01 | 0 | | | |
| 2 | NESTED LOOPS OUTER | | 1 | 1 | 0 |00:00:00.01 | 0 | | | |
| 3 | NESTED LOOPS | | 1 | 1 | 0 |00:00:00.01 | 0 | | | |
|* 4 | HASH JOIN | | 1 | 1 | 0 |00:00:00.01 | 0 | 740K| 740K| |
| 5 | VIEW | | 1 | 1 | 0 |00:00:00.01 | 0 | | | |
| 6 | SORT GROUP BY | | 1 | 1 | 0 |00:00:00.01 | 0 | 73728 | 73728 | |
|* 7 | HASH JOIN OUTER | | 1 | 1 | 0 |00:00:00.01 | 0 | 742K| 742K| |
| 8 | VIEW | | 1 | 1 | 0 |00:00:00.01 | 0 | | | |
| 9 | NESTED LOOPS | | 1 | 1 | 0 |00:00:00.01 | 0 | | | |
| 10 | NESTED LOOPS | | 1 | 1 | 0 |00:00:00.01 | 0 | | | |
|* 11 | HASH JOIN | | 1 | 1 | 0 |00:00:00.01 | 0 | 723K| 723K| |
|* 12 | HASH JOIN | | 1 | 1 | 0 |00:00:00.01 | 0 | 776K| 776K| |
|* 13 | HASH JOIN | | 1 | 524 | 301M|04:06:15.61 | 257 | 82M| 5277K| 143M (0)|
| 14 | MERGE JOIN CARTESIAN | | 1 | 39178 | 1781K|00:00:00.04 | 161 | | | |
| 15 | VIEW | | 1 | 1 | 48 |00:00:00.01 | 0 | | | |
| 16 | CONNECT BY WITHOUT FILTERING| | 1 | | 48 |00:00:00.01 | 0 | | | |
| 17 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
| 18 | BUFFER SORT | | 48 | 39178 | 1781K|00:00:00.04 | 161 | 1753K| 641K| 1558K (0)|
| 19 | INDEX FAST FULL SCAN | XIF_TCCP_SET1 | 1 | 39178 | 37124 |00:00:00.01 | 161 | | | |
| 20 | TABLE ACCESS FULL | TMP_TAR_DATES_DAY | 1 | 41996 | 16274 |00:00:00.07 | 96 | | | |
| 21 | INLIST ITERATOR | | 0 | | 0 |00:00:00.01 | 0 | | | |
| 22 | TABLE ACCESS BY INDEX ROWID | TMP_CALC_PROWS | 0 | 92152 | 0 |00:00:00.01 | 0 | | | |
|* 23 | INDEX RANGE SCAN | XIF_TCPR_SET2 | 0 | 1750 | 0 |00:00:00.01 | 0 | | | |
|* 24 | TABLE ACCESS FULL | TMP_PTO_CALC | 0 | 19983 | 0 |00:00:00.01 | 0 | | | |
|* 25 | TABLE ACCESS BY INDEX ROWID | TMP_TAR_DATES | 0 | 1 | 0 |00:00:00.01 | 0 | | | |
|* 26 | INDEX RANGE SCAN | XIF_TTD_TMP_SET1 | 0 | 1 | 0 |00:00:00.01 | 0 | | | |
| 27 | TABLE ACCESS BY INDEX ROWID | TMP_PRT_CALC_PARAMS | 0 | 1 | 0 |00:00:00.01 | 0 | | | |
|* 28 | INDEX RANGE SCAN | XIF_TPCP_SET1 | 0 | 449 | 0 |00:00:00.01 | 0 | | | |
|* 29 | TABLE ACCESS FULL | TMP_CALC_IDX_ROWS | 0 | 2166 | 0 |00:00:00.01 | 0 | | | |
|* 30 | TABLE ACCESS FULL | TMP_PTO_CALC | 0 | 19983 | 0 |00:00:00.01 | 0 | | | |
|* 31 | TABLE ACCESS BY INDEX ROWID | ERRD | 0 | 1 | 0 |00:00:00.01 | 0 | | | |
|* 32 | INDEX RANGE SCAN | XIF1661_ERRD | 0 | 1 | 0 |00:00:00.01 | 0 | | | |
| 33 | TABLE ACCESS BY INDEX ROWID | MPMNT | 0 | 1 | 0 |00:00:00.01 | 0 | | | |
|* 34 | INDEX UNIQUE SCAN | XPK_PMONTH | 0 | 1 | 0 |00:00:00.01 | 0 | | | |
--------------------------------------------------------------------------------------------------------------------------------------------------------

The main cause of problem is very bad cardinality estimating for "data generator".

SQL> explain plan for
2 select add_months(:p_begin_date, level - 1) dt
3 from dual
4 connect by level < months_between(:p_end_date, :p_begin_date) + 2;

Explained.

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------
Plan hash value: 1236776825

-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
|* 1 | CONNECT BY WITHOUT FILTERING| | | | |
| 2 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------

CBO will decide to assume the single row cardinality for such subquery without any hint. But it is wrong for most combinations of values which are in use.
My recommendation was to add the CARDINALITY hint with enough high value, 100 for example. It will cover most of cases at this time and in future, when the distance will rise between P_BEGIN_DATE and P_END_DATE.