简体   繁体   中英

MySQL - clarification on table partition / subpartition

I've a table with around 300 millions records in it (only for 60 days). The table structure is quite simple ..
It has 12 columns, which 4 of them are Indexed:

LogTime - DateTime
RegionID - Integer
ProvinceID - Integer
CityID - Integer

There is also a Unique Index:

UNIQUE INDEX UI_IDX (`LogTime`, `RegionID`, `ProvinceID`, `CityID`); 

The rest of fields are just values that mostly being fetched by query results.

The database-server is already tuned with good condition (~12GB for buffer-space and etc)

Well, For speeding up the queries I decided to partition the table on LogTime field. I achieved this goal by partitioning the table with PARTITION BY RANGE . Now I've 365 partitions for the table (1 partition for 1 day , 5 millions records for each partition).

The thing is we're not satisfied with query performance, and I was looking for a way to break-down the partitions to smaller pieces .. lets say partition a table on LogTime and RegionID ... I'm not sure if it is applicable in Sub-partition ...

For

select  b.CityName, a.val1, a.val2
    from  DataRepo a
    left join  City b on a.CityID = b.CityID
    where  (a.LogTime >= '2015-08-01 00:00:00'
       and  a.LogTime <= '2015-08-05 00:00:00' )
      and  a.RegionID = 1
      and  a.ProvinceID = 14;

The optimal index is

INDEX(RegionID, ProvinceID, -- in either order
      LogTime)    -- last

No PARTITIONing will help speed up the query above what that index can deliver.

(Unrelated) Did you deliberately want to check 4 days plus one second?

I like this design pattern:

    where  (a.LogTime >= '2015-08-01'
       and  a.LogTime <  '2015-08-01' + INTERVAL 4 DAY )

Oh... This index would be even faster, because it is "covering":

INDEX(RegionID, ProvinceID, -- in either order
      LogTime,    -- range
      CityID)     -- the rest of the fields used in the SELECT

With that, the query will be performed entirely (aside from the JOIN ) in the index, without touching the data. EXPLAIN will say "Using index".

(Another aside) Is Region a subset of Province? Or vice versa? If so, you don't need both columns in the Fact table, do you? (Shrinking the Fact table is another technique for improving performance, since you are doubtless I/O-bound.)

(More) Is Province INT ? That's 4 bytes. If there are under 255 Provinces, use TINYINT UNSIGNED , which is only 1 byte. That will save nearly 1GB in the table, plus another GB in the index I proposed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM