简体   繁体   English

对分区表使用分区索引

[英]Using partitioned indexes with partitioned tables

I'm trying to understand the optimal way to construct composite local partitioned indexes for use with partitioned tables. 我正在尝试了解构造用于分区​​表的复合本地分区索引的最佳方法。

Here is my example table: 这是我的示例表:

ADDRESS
id
street
city
state
tenant

The Address table is list partitioned upon the tenant column. Address表是按租户列划分的列表。 Pretty much all of the queries will have the tenant column in the query, so there's really no concern for cross-partition searches here. 几乎所有查询都将在查询中包含“租户”列,因此,这里实际上不需要考虑跨分区搜索。

I want to make a query like select * from address where tenant = 'X' and street = 'Y' and city = 'Z' perform as optimally as possible, in the end. 最后,我想查询一个类似select * from address where tenant = 'X' and street = 'Y' and city = 'Z'性能尽可能达到最佳。 To me, it seems like the right way for that to go would be to first limit to the particular tenant (partition) and then use the local partitioned index. 在我看来,正确的做法是首先限制特定租户(分区),然后使用本地分区索引。

Now, I believe that only one index can be used per reference table, so I want to make a composite local partitioned index that will be most useful. 现在,我相信每个引用表只能使用一个索引,因此我想制作一个最有用的复合本地分区索引。 I envision the composite index having street and city in it. 我设想其中包含街道和城市的综合指数。 So I have two questions: 所以我有两个问题:

  1. Should tenant have an index by itself? 房客应该自己拥有一个索引吗?

  2. Should the tenant be part of the composite index? 承租人应该成为综合指数的一部分吗?

Some understanding behind why it should be on way or another would be helpful as I don't think I fully understand how the partitions work with the partitioned indexes. 为什么我应该完全理解分区为什么要使用其他方法,所以对此有所帮助,因为我认为我不完全理解分区如何与分区索引一起工作。

create index address_city_street_idx on address(city, street) compress 1 local;

I believe that index is ideal for this query, given a table that is list -partitioned on TENANT: 我相信,给定一个在TENANT上进行列表分区的表,索引对于此查询是理想的:

select * from address where tenant = 'X' and street = 'Y' and city = 'Z' 

To answer questions 1 and 2: Since TENANT is the partition key it should not be in this index, and probably should not be in any index. 回答问题1和2:由于TENANT是分区键,因此它不应在此索引中,并且可能不应在任何索引中。 That column is already used by the partition pruning to select the relevant segment. 分区修剪已使用该列来选择相关段。 That work is done at compile or parse time, and is virtually free. 该工作在编译或解析时完成,并且实际上是免费的。

The execution plans in the test case demonstrate that partition pruning is happening. 测试用例中的执行计划表明分区修剪正在发生。 The operation PARTITION LIST SINGLE and the fact that the columns Pstart and Pstop list the number 3, instead of a variable like KEY , show that Oracle has already determined the partition before the query has run. 操作PARTITION LIST SINGLE和事实的列PstartPstop列表,而不是像变量3号, KEY ,表明甲骨文已经确定的分区查询已运行之前。 Oracle is instantly discarding irrelevant TENANTs at compile time, there's no need to worry about further reducing the TENANTs at run time with an index. Oracle会在编译时立即丢弃不相关的TENANT,因此无需担心在运行时使用索引进一步减少TENANT。


My index suggestion depends on a few assumptions about the data. 我的索引建议取决于有关数据的一些假设。 Neither CITY nor STREET sound like they would uniquely identify a row for a tenant. CITY和STREET听起来都不像他们会唯一标识租户的行。 And STREET sounds much more selective than CITY. 而且STREET听起来比CITY更具选择性。 If a single CITY has multiple STREETs then indexing them in that order and using index compression can save a lot of space. 如果单个CITY具有多个STREET,则按该顺序对其进行索引并使用索引压缩可以节省大量空间。

If the index is significantly smaller it may have less levels, which means it would require slightly fewer I/Os for a lookup. 如果索引明显较小,则它的级别可能更少,这意味着查找所需的I / O会稍微少一些。 And if it's smaller more of it could fit in the buffer cache, which might further improve performance. 而且,如果较小,则可以将其更多地放入缓冲区高速缓存中,这可能会进一步提高性能。

But with a table this large, I have a feeling the BLEVEL (number of index levels) will be the same for both, and both indexes will be too large to use cache effectively. 但是对于如此大的表,我感觉BLEVEL(索引级别的数量)对于两个而言都是相同的,并且两个索引都太大而无法有效地使用缓存。 Which means there may not be any performance difference between (CITY,STREET) and (STREET,CITY) . 这意味着(CITY,STREET)(STREET,CITY)之间可能没有任何性能差异。 But with (CITY,STREET) and compression you may at least be able to save a large amount of space. 但是,使用(CITY,STREET)和压缩,您至少可以节省大量空间。

Test Case 测试用例

I assume you cannot simply create both indexes on production and try them out. 我认为您不能简单地在生产中创建两个索引并进行尝试。 In that case you'll want to create some tests first. 在这种情况下,您将需要首先创建一些测试。

This test case does not strongly support my suggestion. 这个测试用例不完全支持我的建议。 It is merely a starting point for a more thorough test case. 它只是更全面的测试案例的起点。 You'll need to create one with a larger amount of data and a more realistic data distribution. 您将需要创建一个包含大量数据和更实际数据分布的数据。

--Create sample table.
create table address
(
    id number,
    street varchar2(100),
    city varchar2(100),
    state varchar2(100),
    tenant varchar2(100)
) partition by list (tenant)
(
    partition p1 values ('tenant1'),
    partition p2 values ('tenant2'),
    partition p3 values ('tenant3'),
    partition p4 values ('tenant4'),
    partition p5 values ('tenant5')
) nologging;

--Insert 5M rows.
--Note the assumptions about the selectivity of the street and city
--are critical to this issue.  Adjust the MOD as necessary.
begin
    for i in 1 .. 5 loop
        insert /*+ append */ into address
        select
            level,
            'Fake Street '||mod(level, 10000),
            'City '||mod(level, 100),
            'State',
            'tenant'||i
        from dual connect by level <= 1000000;
        commit;
    end loop;
end;
/

--Table uses 282MB.
select sum(bytes)/1024/1024 mb from dba_segments where segment_name = 'ADDRESS' and owner = user;

--Create different indexes.
create index address_city_street_idx on address(city, street) compress 1 local;
create index address_street_city_idx on address(street, city) local;

--Gather statistics.
begin
    dbms_stats.gather_table_stats(user, 'ADDRESS');
end;
/

--Check execution plan.
--Oracle by default picks STREET,CITY over CITY,STREET.
--I'm not sure why.  And the cost difference is only 1, so I think things may be different with realistic data.
explain plan for select * from address where tenant = 'tenant3' and street = 'Fake Street 50' and city = 'City 50';
select * from table(dbms_xplan.display);

/*
Plan hash value: 2845844304

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                         |     1 |    44 |     4   (0)| 00:00:01 |       |       |
|   1 |  PARTITION LIST SINGLE                     |                         |     1 |    44 |     4   (0)| 00:00:01 |     3 |     3 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| ADDRESS                 |     1 |    44 |     4   (0)| 00:00:01 |     3 |     3 |
|*  3 |    INDEX RANGE SCAN                        | ADDRESS_STREET_CITY_IDX |     1 |       |     3   (0)| 00:00:01 |     3 |     3 |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("STREET"='Fake Street 50' AND "CITY"='City 50')
*/

--Check execution plan of forced CITY,STREET index.
--I don't suggest using a hint in the real query, this is just to compare plans.
explain plan for select /*+ index(address address_city_street_idx) */ * from address where tenant = 'tenant3' and street = 'Fake Street 50' and city = 'City 50';
select * from table(dbms_xplan.display);

/*
Plan hash value: 1084849450

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                         |     1 |    44 |     5   (0)| 00:00:01 |       |       |
|   1 |  PARTITION LIST SINGLE                     |                         |     1 |    44 |     5   (0)| 00:00:01 |     3 |     3 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| ADDRESS                 |     1 |    44 |     5   (0)| 00:00:01 |     3 |     3 |
|*  3 |    INDEX RANGE SCAN                        | ADDRESS_CITY_STREET_IDX |     1 |       |     3   (0)| 00:00:01 |     3 |     3 |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CITY"='City 50' AND "STREET"='Fake Street 50')
*/

--Both indexes have BLEVEL=2.
select *
from dba_indexes
where index_name in ('ADDRESS_CITY_STREET_IDX', 'ADDRESS_STREET_CITY_IDX');

--CITY,STREET = 160MB, STREET,CITY=200MB.
--You can see the difference already.  It may get larger with different data distribution.
--And it may get larger with more data, as it may compress better with more repetition.
select segment_name, sum(bytes)/1024/1024 mb
from dba_segments
where segment_name in ('ADDRESS_CITY_STREET_IDX', 'ADDRESS_STREET_CITY_IDX')
group by segment_name;

If index unique then you have to include TENANT to make it local. 如果索引唯一,则必须包含TENANT才能使其本地化。 If it is not unique then do not include it as it will not improve any performance in case of LIST/RANGE partition. 如果它不是唯一的,则不要包括它,因为在LIST / RANGE分区的情况下它不会提高任何性能。 You can consider to include it if it is hash partition with many distinct values in one partition. 如果它是一个分区中具有许多不同值的哈希分区,则可以考虑将其包括在内。

UPD: However it depends what kind of partitioning you're using - "static" or "dynamic". UPD:但是,这取决于您使用的是哪种分区-“静态”还是“动态”。 "Static" is when all partitions are defined once in create table statement and stay unchanged while application is running. “静态”是指在create table语句中一次定义了所有分区,并且在应用程序运行时保持不变。 "Dynamic" is when application adds/change partitions (like daily process adds daily list partitions for all tables and etc). “动态”是指应用程序添加/更改分区(例如每日过程为所有表等添加每日列表分区)。

So you should avoid global index for "dynamic" partitioning - in this case it will become invalid every time when you add new partition. 因此,应避免使用“动态”分区的全局索引-在这种情况下,每次添加新分区时,全局索引都将无效。 For "static" option it is ok to use global index if you sometimes need to scan across all partitions. 对于“静态”选项,如果有时需要跨所有分区扫描,则可以使用全局索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM