2009/01/01

The use of statistically incorrect assumptions when using this feature can lead to incorrect or undesirable results

В описании sample_clause для select statement имеется такое предупреждение:


Caution:

The use of statistically incorrect assumptions when using this feature can lead to incorrect or undesirable results.

Чем это может "грозить"?
Всем известный dbms_stats широко использует sample_clause при вычислении статистических показателей. При использовании параметров сбора статистики по умолчанию возможны аномалии:

SQL> set serveroutput on
SQL> begin
2 dbms_output.put_line(DBMS_STATS.GET_PARAM('ESTIMATE_PERCENT'));
3 end;
4 /
DBMS_STATS.AUTO_SAMPLE_SIZE

PL/SQL procedure successfully completed.

SQL> exec DBMS_STATS.DELETE_TABLE_STATS('CS','WK_REQUEST_PARS')

PL/SQL procedure successfully completed.


Вот например, собранные в таких условиях некоторые показатели:

SQL> select COLUMN_NAME,NUM_DISTINCT,NUM_NULLS from ALL_TAB_COL_STATISTICS where
2 owner='CS' and table_name='WK_REQUEST_PARS' and COLUMN_NAME='WOP_NUMID';

COLUMN_NAME NUM_DISTINCT NUM_NULLS
------------------------------ ------------ ----------
WOP_NUMID 1633 0

Как выяснилось, такое значение для NUM_DISTINCT - крайне нереалистично. И соответственно, план одного из популярных запросов далек от совершенства:

------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
------------------------------------------------------------------------------------------------------------------------------
|* 1 | TABLE ACCESS BY INDEX ROWID | WK_REQUEST_PARS | 1 | 1 | 0 |00:00:55.75 | 99643 | 11607 |
| 2 | NESTED LOOPS | | 1 | 1 | 63144 |00:00:30.33 | 92490 | 6769 |
| 3 | NESTED LOOPS | | 1 | 1 | 31572 |00:00:08.30 | 29242 | 2455 |
| 4 | MERGE JOIN CARTESIAN | | 1 | 1 | 6 |00:00:00.01 | 12 | 0 |
| 5 | NESTED LOOPS | | 1 | 1 | 1 |00:00:00.01 | 7 | 0 |
| 6 | TABLE ACCESS BY INDEX ROWID| OORG | 1 | 1 | 1 |00:00:00.01 | 2 | 0 |
|* 7 | INDEX UNIQUE SCAN | XPK_ORG_ID | 1 | 1 | 1 |00:00:00.01 | 1 | 0 |
|* 8 | TABLE ACCESS BY INDEX ROWID| DOCF_DV | 1 | 1 | 1 |00:00:00.01 | 5 | 0 |
|* 9 | INDEX RANGE SCAN | XI_DIC_DIDI | 1 | 6 | 6 |00:00:00.01 | 3 | 0 |
| 10 | BUFFER SORT | | 1 | 1 | 6 |00:00:00.01 | 5 | 0 |
|* 11 | TABLE ACCESS BY INDEX ROWID| DOCF_DV | 1 | 1 | 6 |00:00:00.01 | 5 | 0 |
|* 12 | INDEX RANGE SCAN | XI_DIC_DIDI | 1 | 6 | 6 |00:00:00.01 | 2 | 0 |
|* 13 | TABLE ACCESS BY INDEX ROWID | WK_REQUESTS | 6 | 29 | 31572 |00:00:08.25 | 29230 | 2455 |
|* 14 | INDEX RANGE SCAN | XIF1122_WK_REQUESTS | 6 | 3428 | 191K|00:00:01.02 | 388 | 62 |
|* 15 | INDEX RANGE SCAN | IDX_WOP_WOR | 31572 | 8 | 31571 |00:00:21.83 | 63248 | 4314 |
------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(("WK_REQUEST_PARS"."WOP_NUMID"='6453768762' AND "WK_REQUEST_PARS"."WOP_ST"='R' AND
"WK_REQUEST_PARS"."WOP_IS_SPECIAL"='T'))

7 - access("ORG_ID"=768590)
8 - filter("DIC_VALUE"='R')
9 - access("DIC_DIDI"=12002)
11 - filter(("DIC_VALUE"='A' OR "DIC_VALUE"='C' OR "DIC_VALUE"='F' OR "DIC_VALUE"='H' OR "DIC_VALUE"='I' OR "DIC_VALUE"='K' OR
"DIC_VALUE"='P'))
12 - access("DIC_DIDI"=12001)
13 - filter("WK_REQUESTS"."WOR_REQ_TP"="DIC_VALUE")
14 - access("WK_REQUESTS"."WOR_ORG"=768590)
15 - access("WK_REQUEST_PARS"."WOP_WOR"="WK_REQUESTS"."WOR_ID")

Однако, если сделать так:

exec dbms_stats.gather_table_stats('CS','WK_REQUEST_PARS',cascade=>true,estimate_percent=>100)

то в результате:

SQL> select COLUMN_NAME,NUM_DISTINCT,NUM_NULLS from ALL_TAB_COL_STATISTICS where
2 owner='CS' and table_name='WK_REQUEST_PARS' and COLUMN_NAME='WOP_NUMID';

COLUMN_NAME NUM_DISTINCT NUM_NULLS
------------------------------ ------------ ----------
WOP_NUMID 628654 48

и план того же запроса:

----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
----------------------------------------------------------------------------------------------------------------
| 1 | NESTED LOOPS | | 1 | 1 | 0 |00:00:00.01 | 14 |
| 2 | MERGE JOIN CARTESIAN | | 1 | 1 | 0 |00:00:00.01 | 14 |
| 3 | MERGE JOIN CARTESIAN | | 1 | 1 | 6 |00:00:00.01 | 10 |
| 4 | NESTED LOOPS | | 1 | 1 | 6 |00:00:00.01 | 7 |
| 5 | TABLE ACCESS BY INDEX ROWID| OORG | 1 | 1 | 1 |00:00:00.01 | 2 |
|* 6 | INDEX UNIQUE SCAN | XPK_ORG_ID | 1 | 1 | 1 |00:00:00.01 | 1 |
|* 7 | TABLE ACCESS BY INDEX ROWID| DOCF_DV | 1 | 1 | 6 |00:00:00.01 | 5 |
|* 8 | INDEX RANGE SCAN | XI_DIC_DIDI | 1 | 6 | 6 |00:00:00.01 | 2 |
| 9 | BUFFER SORT | | 6 | 1 | 6 |00:00:00.01 | 3 |
|* 10 | TABLE ACCESS BY INDEX ROWID| DOCF_DV | 1 | 1 | 1 |00:00:00.01 | 3 |
|* 11 | INDEX RANGE SCAN | XI_DIC_DIDI | 1 | 6 | 6 |00:00:00.01 | 2 |
| 12 | BUFFER SORT | | 6 | 1 | 0 |00:00:00.01 | 4 |
|* 13 | TABLE ACCESS BY INDEX ROWID | WK_REQUEST_PARS | 1 | 1 | 0 |00:00:00.01 | 4 |
|* 14 | INDEX RANGE SCAN | IDX_WRP_NUMID | 1 | 22 | 1 |00:00:00.01 | 3 |
|* 15 | TABLE ACCESS BY INDEX ROWID | WK_REQUESTS | 0 | 1 | 0 |00:00:00.01 | 0 |
|* 16 | INDEX UNIQUE SCAN | PK_WK_REQUESTS | 0 | 1 | 0 |00:00:00.01 | 0 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

6 - access("ORG_ID"=768590)
7 - filter(("DIC_VALUE"='A' OR "DIC_VALUE"='C' OR "DIC_VALUE"='F' OR "DIC_VALUE"='H' OR "DIC_VALUE"='I' OR "DIC_VALUE"='K' OR
"DIC_VALUE"='P'))
8 - access("DIC_DIDI"=12001)
10 - filter("DIC_VALUE"='R')
11 - access("DIC_DIDI"=12002)
13 - filter(("WK_REQUEST_PARS"."WOP_ST"='R' AND "WK_REQUEST_PARS"."WOP_IS_SPECIAL"='T'))
14 - access("WK_REQUEST_PARS"."WOP_NUMID"='6453768762')
15 - filter(("WK_REQUESTS"."WOR_ORG"=768590 AND "WK_REQUESTS"."WOR_REQ_TP"="DIC_VALUE"))
16 - access("WK_REQUEST_PARS"."WOP_WOR"="WK_REQUESTS"."WOR_ID")

Разница разительна!
Как сообщил мне один из разработчиков, на начальных стадиях заполнения таблицы WK_REQUEST_PARS существовала ошибка в прикладном коде, в связи с которой возникло множество дубликатов для значения поля WOP_NUMID. Видимо это оказало фатальное влияние на работу sample_clause.

No comments: