Because of some problems with the Oracle statistics I did a small research.
I have a custom procedure of gathering stats that i call p_x_stats_table. It will gather stats with and estimation of X% of the records. With X corresponding a percentage witch reflect 100.000 records (or all records of table if amount of records inferiuor then 100.000).
With that procedure i have no problems, it’s more with the “standard” ones. (btw this has been tested with Oracle 10.2.0.3).
The real amount of records of TABLE_X:
COUNT(*)
———-
1507788
SQL>exec p_x_stats_table('TABLE_X');
PL/SQL PROCEDURE successfully completed.
Elapsed: 00:04:09.00
NUM_ROWS AVG_ROW_LEN SAMPLE_SIZE
---------- ----------- -----------
1505339 589 99804
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS SAMPLE_SIZE HISTOGRAM AVG_COL_LEN
---------------------- ------------ ----------- ----------- --------------- -----------
PARTITION_KEY 1 1 99804 FREQUENCY 13
CONTRACT_REFERENCE 1082788 1 99804 NONE 20
TABLE_NAME 18 1 99804 NONE 9
CONTRACT_TYPE 192 10 99804 HEIGHT BALANCED 9
BS_TYPE 9 1 99804 NONE 4
BOOK_CODE 320 27 99804 HEIGHT BALANCED 10
TRADING 2 1 99804 NONE 2
GL_BOOK_TYPE 8 1 99804 NONE 9
START_DATE 6171 254 98753 HEIGHT BALANCED 8
Here instead of taking a pourcentage (based on 100.000 records) we take 5% and a block_sampling at TRUE (with p_x_stats_table block_sampling is always false).
SQL>exec DBMS_STATS.GATHER_TABLE_STATS(USER,'TABLE_X',NULL,5,TRUE,'FOR ALL COLUMNS SIZE SKEWONLY',3,'GLOBAL',TRUE);
PL/SQL PROCEDURE successfully completed.
Elapsed: 00:02:21.14
NUM_ROWS AVG_ROW_LEN SAMPLE_SIZE
---------- ----------- -----------
1540240 586 77012
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS SAMPLE_SIZE HISTOGRAM AVG_COL_LEN
---------------------- ------------ ----------- ----------- --------------- -----------
PARTITION_KEY 1 1 77012 FREQUENCY 13
CONTRACT_REFERENCE 924372 1 77012 NONE 20
TABLE_NAME 10 1 77012 NONE 9
CONTRACT_TYPE 105 10 77012 HEIGHT BALANCED 9
BS_TYPE 5 1 77012 NONE 4
BOOK_CODE 137 28 77012 HEIGHT BALANCED 10
TRADING 2 1 77012 NONE 2
GL_BOOK_TYPE 7 1 77012 NONE 9
START_DATE 5646 254 76954 HEIGHT BALANCED 8
You can see that the num_distinct values are not so optimal (for example column table_name contains in reality 19 distinct values). Now let’s try with DBMS_STATS.AUTO_SAMPLE_SIZE (in my opinion only usable from Oracle 11G onwards, why? you’ll see). Block sampling stays at TRUE in this example.
SQL>exec DBMS_STATS.GATHER_TABLE_STATS(USER,'TABLE_X',NULL,DBMS_STATS.AUTO_SAMPLE_SIZE,TRUE,'FOR ALL COLUMNS SIZE SKEWONLY',3,'GLOBAL',TRUE);
PL/SQL PROCEDURE successfully completed.
Elapsed: 00:13:46.92
NUM_ROWS AVG_ROW_LEN SAMPLE_SIZE
---------- ----------- -----------
1507788 596 1507788
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS SAMPLE_SIZE HISTOGRAM AVG_COL_LEN
---------------------- ------------ ----------- ----------- --------------- -----------
PARTITION_KEY 1 1 6840 FREQUENCY 13
CONTRACT_REFERENCE 1211381 254 1507788 HEIGHT BALANCED 20
TABLE_NAME 4 4 6840 FREQUENCY 10
CONTRACT_TYPE 33 33 6840 FREQUENCY 9
BS_TYPE 3 3 6840 FREQUENCY 4
BOOK_CODE 14 14 6840 FREQUENCY 10
TRADING 1 1 6840 FREQUENCY 2
GL_BOOK_TYPE 1 1 6840 FREQUENCY 9
START_DATE 1744 254 6840 HEIGHT BALANCED 8
WOOOOOW, check those results, this stinks! Bad histogram types, bad number of distinc values. No wonder that some of the production queries have sometimes long execution times. Now with some more research of the skewed data in my table i discover that the main reason of my problems might be the block_sample parameter. Let’s try now to put this parameter to FALSE. And did you noted how long it took to gather wrong stats!?!
sql> exec DBMS_STATS.GATHER_TABLE_STATS(USER,'TABLE_X',NULL,DBMS_STATS.AUTO_SAMPLE_SIZE,FALSE,'FOR ALL COLUMNS SIZE SKEWONLY',3,'GLOBAL',TRUE);
PL/SQL PROCEDURE successfully completed.
Elapsed: 00:10:03.51
NUM_ROWS AVG_ROW_LEN SAMPLE_SIZE
---------- ----------- -----------
1507788 590 1507788
COLUMN_NAME NUM_DISTINCT NUM_BUCKETS SAMPLE_SIZE HISTOGRAM AVG_COL_LEN
---------------------- ------------ ----------- ----------- --------------- -----------
PARTITION_KEY 1 1 5721 FREQUENCY 13
CONTRACT_REFERENCE 1211381 254 1507788 HEIGHT BALANCED 20
TABLE_NAME 16 16 5721 FREQUENCY 9
CONTRACT_TYPE 113 113 5721 FREQUENCY 9
BS_TYPE 9 9 5721 FREQUENCY 4
BOOK_CODE 141 141 5721 FREQUENCY 10
TRADING 2 2 5721 FREQUENCY 2
GL_BOOK_TYPE 8 8 5721 FREQUENCY 9
START_DATE 3021 254 5653 HEIGHT BALANCED 8
Not to bad, slightly faster then block_sample=TRUE (but still 250% slower then p_x_stats_table), much better results then with block_sample=TRUE. The choice of histograms type are slightly bizar. Certain columns don’t need histograms at all (that might explain why the stats took much longer).
So, if you guys are using Oracle 10R2 (or older) try to make yourself your own statistics program (with block_sample=FALSE if you have a table with skewed data and do a sample of around 100.000 records.