Using partitioning to colocate data for optimal multiple index access paths

Jun 29, 2007 tuning by Jeff

I came across a situation the other day where a client was accessing rows from a table via one of several different indexes and getting poor performance on all of them.

The table in question had about five million rows and about four indexes. For the process in question, the table was hitting each of the indexes on this single table at one point or another.

I noticed that the clustering factor for each of the indexes was quite poor – a value closer to the number of rows in the table rather than the number of blocks in the table. This was leading to a high number of logical IOs to get the rows needed on each query in the process, because the rows pointed to by the index were all in different data blocks, scattered far and wide. As I recalle, Jonathan gives a great explanation of how the clustering factor can affect the performance of a query in Chapter five of his CBO Fundamentals book and basically this was relevant to this scenario.

A fix for a poor clustering factor, is to sort the data in the table, in the order which matches that of the index in question. So, if INDEX_A is on columns A then B, you should order the data in the table by column A, then B…perhaps by removing it and then reinserting it all using an ORDER BY on the insert.

This of course, only works for one order – you can’t order data in multiple different ways within the same table, to suit all the indexes, so for one index you might be able to order the data perfectly but that might be terrible for another index which wants to order by say, column C, then D….or does it?

Well, in this particular case, the data required by each query hitting different indexes on this single table was all of a different type – by this I mean, the table itself was modelled from a Supertype Entity, implemented physically as a single table with a Supertype differentiator column (PRODUCT_TYPE in this case). Query one in the process wanted data for PRODUCT_TYPE “X”, and wanted to get this via Index IDX1 on COL1. Query two would then want data of PRODUCT_TYPE “Y” and get it via index IDX2 on COL2 etc…

I figured that if we split the table into partitions based on the PRODUCT_TYPE then we’d be able to order the data within each partition by the appropriate order for the index which would commonly be used to access data of that PRODUCT_TYPE.

It will take them a while to test and implement the solution and gain some empirical results, however a simple test script I knocked up seemed to indicate that the solution could work:

The script…


CLEAR BREAKS
CLEAR COLUMNS
COLUMN index_name HEADING "IndexName" FORMAT A10
COLUMN partition_name HEADING "PartitionName" FORMAT A10
COLUMN leaf_blocks HEADING "LeafBlocks" FORMAT 999
COLUMN distinct_keys HEADING "DistinctKeys" FORMAT 999,999
COLUMN num_rows HEADING "NumRows" FORMAT 999,999
COLUMN clustering_factor HEADING "ClusteringFactor" FORMAT 999,999
COLUMN clfpct HEADING "Clu FactPercent" FORMAT 999

DROP TABLE jeff_unpartitioned PURGE
/
CREATE TABLE jeff_unpartitioned
AS
SELECT l pk_col
, MOD(l,4) partitioning_key_col
, (dbms_random.value * 1000) index_col1
, (dbms_random.value * 1000) index_col2
, (dbms_random.value * 1000) index_col3
, (dbms_random.value * 1000) index_col4
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 100001)
ORDER BY l
/
CREATE UNIQUE INDEX ju_pk ON jeff_unpartitioned(pk_col)
/
CREATE INDEX ju_idx1 ON jeff_unpartitioned(index_col1)
/
CREATE INDEX ju_idx2 ON jeff_unpartitioned(index_col2)
/
CREATE INDEX ju_idx3 ON jeff_unpartitioned(index_col3)
/
CREATE INDEX ju_idx4 ON jeff_unpartitioned(index_col4)
/
exec DBMS_STATS.GATHER_TABLE_STATS (ownname=>USER,tabname=>’JEFF_UNPARTITIONED’);

SELECT i.index_name
, i.leaf_blocks
, i.distinct_keys
, i.clustering_factor
, (DECODE((t.num_rows – t.blocks),0,0,(i.clustering_factor) / (t.num_rows – t.blocks))) * 100 clfpct
FROM all_indexes i
, all_tables t
WHERE i.table_name = t.table_name
AND i.table_owner = t.owner
AND t.table_name = ‘JEFF_UNPARTITIONED’
AND t.owner = USER
/

DROP TABLE jeff_partitioned PURGE
/
CREATE TABLE jeff_partitioned
(pk_col number
,partitioning_key_col number
,index_col1 number
,index_col2 number
,index_col3 number
,index_col4 number
)
PARTITION BY LIST(partitioning_key_col)
(PARTITION p1 VALUES (1)
,PARTITION p2 VALUES (2)
,PARTITION p3 VALUES (3)
,PARTITION p4 VALUES (4)
)
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 1
, ROUND(l,100)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 3
/
COMMIT
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 2
, (dbms_random.value * 1000)
, ROUND(l,100)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 4
/
COMMIT
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 3
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, ROUND(l,100)
, (dbms_random.value * 1000)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 5
/
COMMIT
/
INSERT /*+ APPEND */
INTO jeff_partitioned(pk_col,partitioning_key_col,index_col1,index_col2,index_col3,index_col4)
SELECT l
, 4
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, (dbms_random.value * 1000)
, ROUND(l,100)
FROM (SELECT level l FROM dual CONNECT BY LEVEL < 25001)
ORDER BY 6
/
COMMIT
/
CREATE UNIQUE INDEX jp_pk ON jeff_partitioned(pk_col,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx1 ON jeff_partitioned(index_col1,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx2 ON jeff_partitioned(index_col2,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx3 ON jeff_partitioned(index_col3,partitioning_key_col) LOCAL
/
CREATE INDEX jp_idx4 ON jeff_partitioned(index_col4,partitioning_key_col) LOCAL
/
exec DBMS_STATS.GATHER_TABLE_STATS (ownname=>USER,tabname=>’JEFF_PARTITIONED’);

SELECT i.index_name
, i.leaf_blocks
, i.distinct_keys
, i.clustering_factor
The tool that can help to get up and apply acheter pfizer viagra for your driver’s license. Does it give the user an automatic erection? Yes, there are side effects of tadalafil various things one can try for erectile dysfunction after prostate cancer radiation therapy, which can extend up until a year or more. This will give an erection to best viagra india about four to five hours with no sort of issue. They work safely to cure impotence and improve the levitra generic no prescription overall energy level. , (DECODE((t.num_rows – t.blocks),0,0,(i.clustering_factor) / (t.num_rows – t.blocks))) * 100 clfpct
FROM all_indexes i
, all_tables t
WHERE i.table_name = t.table_name
AND i.table_owner = t.owner
AND t.table_name = ‘JEFF_PARTITIONED’
AND t.owner = USER
ORDER BY i.index_name
/

SELECT i.index_name
, ip.partition_name
, ip.leaf_blocks
, ip.distinct_keys
, ip.clustering_factor
, (DECODE((t.num_rows – t.blocks),0,0,(ip.clustering_factor) / (t.num_rows – t.blocks))) * 100 clfpct
FROM all_indexes i
, all_ind_partitions ip
, all_tables t
WHERE i.table_name = t.table_name
AND i.table_owner = t.owner
AND i.index_name = ip.index_name
AND i.owner = ip.index_owner
AND t.table_name = ‘JEFF_PARTITIONED’
AND t.owner = USER
ORDER BY i.index_nam e
, ip.partition_name
/

And now the results…


Table dropped.

Table created.

Index created.

PL/SQL procedure successfully completed.

Index Leaf Distinct Clustering Clu Fact
Name Blocks Keys Factor Percent
———- —— ——– ———- ——–
JU_IDX4 226 100,000 99,830 101
JU_IDX3 226 100,000 99,880 101
JU_IDX2 226 100,000 99,872 101
JU_IDX1 226 100,000 99,833 101
JU_PK 103 100,000 685 1

5 rows selected.

Table dropped.

Table created.

25000 rows created.

Commit complete.

25000 rows created.

Commit complete.

25000 rows created.

Commit complete.

25000 rows created.

Commit complete.

Index created.

PL/SQL procedure successfully completed.

Index Leaf Distinct Clustering Clu Fact
Name Blocks Keys Factor Percent
———- —— ——– ———- ——–
JP_IDX1 218 100,000 74,593 75
JP_IDX2 218 100,000 74,624 75
JP_IDX3 218 100,000 74,609 75
JP_IDX4 218 100,000 74,627 75
JP_PK 124 100,000 568 1

5 rows selected.

Index Partition Leaf Distinct Clustering Clu Fact
Name Name Blocks Keys Factor Percent
———- ———- —— ——– ———- ——–
JP_IDX1 P1 32 25,000 142 0
JP_IDX1 P2 62 25,000 24,826 25
JP_IDX1 P3 62 25,000 24,811 25
JP_IDX1 P4 62 25,000 24,814 25
JP_IDX2 P1 62 25,000 24,812 25
JP_IDX2 P2 32 25,000 142 0
JP_IDX2 P3 62 25,000 24,835 25
JP_IDX2 P4 62 25,000 24,835 25
JP_IDX3 P1 62 25,000 24,802 25
JP_IDX3 P2 62 25,000 24,826 25
JP_IDX3 P3 32 25,000 142 0
JP_IDX3 P4 62 25,000 24,839 25
JP_IDX4 P1 62 25,000 24,830 25
JP_IDX4 P2 62 25,000 24,832 25
JP_IDX4 P3 62 25,000 24,823 25
JP_IDX4 P4 32 25,000 142 0
JP_PK P1 31 25,000 142 0
JP_PK P2 31 25,000 142 0
JP_PK P3 31 25,000 142 0
JP_PK P4 31 25,000 142 0

20 rows selected.

From the results we can see that when the table is unpartitioned and, in this instance, ordered by the primary key, all four indexes have high clustering factors.

We could order the data by one of the columns in one of the indexes and make that index get a good (low) clustering factor, but the other indexes would still remain with high values – you can’t make them all have low clustering factors whilst the table is one big amorphous mass of data.

In the partitioned table we are able to order the data in the most efficient way for each partition, insofar as each partition maps to an access path via a specific index.

So, partition P1, where users would commonly access the data via index JP_IDX1, has a low clustering factor because we ordered the data by INDEX_COL1 on insert.

Partition P2 has the lowest clustering factor for index JP_IDX2 and P3 for JP_IDX3 etc…

Some caveats to bear in mind for this approach:

1. You should ensure that your queries do use the partitioning key column when accessing the table, otherwise you’ll end up scanning extra index partitions unnecessarily. You will also make life extremely difficult, if not impossible, for the optimizer because if we don’t identify that the query will only use one partition then the CBO will be forced to use the global stats on the index/table rather than the specific ones for a single partition…the global ones as you can see from the results still show a high clustering factor because it is an aggregate which is hiding the detail that one specific partition – the one we really want – has a low clustering factor.

2. If you want LOCAL indexes, you will need to add the partitioning key column to that index – if it is not already present of course. As Jonathan points out in the first comment on this post…this only applies if the index in question is unique and supports a PK/unique constraint. If the index does not support a PK/unique constraint then it can be LOCAL without the need to have the partitioning key column(s) in it.

3. If you are using GLOBAL indexes on the partitioned table then they will still have reasonably high clustering factors due to the partitions where the data is not ordered by the columns of that index, but the data itself will be ordered within the partition that we access via the partioning key column so we should get good performance.

4. If you want to use ordered data in a table then you need to factor that into the way you populate the table – if it’s a batch process then you can this as part of that process but if it’s built up over time, in an OLTP fashion then you’ll need periodic “rebuilds” of the table to take the data out and put it back in, reordered. This is something you would need to do whether it was partitioned of course.

I’m sure there are more caveats.

Your mileage may, of course, vary.

Month: June 2007

Using partitioning to colocate data for optimal multiple index access paths