简体   繁体   English

改进通过联接更新大表的性能

[英]Improving performance of updating large table with join

Currently I have a table with schema as follows: 目前,我有一个具有架构的表,如下所示:

 mData | CREATE TABLE `mData` (
   `m1` mediumint(8) unsigned DEFAULT NULL,
   `m2` smallint(5) unsigned DEFAULT NULL,
   `m3` bigint(20) DEFAULT NULL,
   `m4` tinyint(4) DEFAULT NULL,
   `m5` date DEFAULT NULL,
   KEY `m_m1` (`m1`) USING HASH,
   KEY `m_date` (`m5`),
   KEY `m_m2` (`m2`),
   KEY `m_combined` (`m1`,`m2`,`m5`),
   KEY `m1_tradeday` (`m1`,`m5`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
 /*!50100 PARTITION BY RANGE ( YEAR(m5))
 SUBPARTITION BY HASH (MONTH(m5))
 (PARTITION p2013 VALUES LESS THAN (2014)
  (SUBPARTITION dec_2013 ENGINE = InnoDB,
   SUBPARTITION jan_2013 ENGINE = InnoDB,
   SUBPARTITION feb_2013 ENGINE = InnoDB,
   SUBPARTITION mar_2013 ENGINE = InnoDB,
   SUBPARTITION apr_2013 ENGINE = InnoDB,
   SUBPARTITION may_2013 ENGINE = InnoDB,
   SUBPARTITION jun_2013 ENGINE = InnoDB,
   SUBPARTITION jul_2013 ENGINE = InnoDB,
   SUBPARTITION aug_2013 ENGINE = InnoDB,
   SUBPARTITION sep_2013 ENGINE = InnoDB,
   SUBPARTITION oct_2013 ENGINE = InnoDB,
  SUBPARTITION nov_2013 ENGINE = InnoDB),
  PARTITION p2014 VALUES LESS THAN (2015)
  (SUBPARTITION dec_2014 ENGINE = InnoDB,
   SUBPARTITION jan_2014 ENGINE = InnoDB,
   SUBPARTITION feb_2014 ENGINE = InnoDB,
   SUBPARTITION mar_2014 ENGINE = InnoDB,
   SUBPARTITION apr_2014 ENGINE = InnoDB,
   SUBPARTITION may_2014 ENGINE = InnoDB,
   SUBPARTITION jun_2014 ENGINE = InnoDB,
   SUBPARTITION jul_2014 ENGINE = InnoDB,
   SUBPARTITION aug_2014 ENGINE = InnoDB,
   SUBPARTITION sep_2014 ENGINE = InnoDB,
   SUBPARTITION oct_2014 ENGINE = InnoDB,
   SUBPARTITION nov_2014 ENGINE = InnoDB),
  PARTITION p2015 VALUES LESS THAN (2016)
  (SUBPARTITION dec_2015 ENGINE = InnoDB,
   SUBPARTITION jan_2015 ENGINE = InnoDB,
   SUBPARTITION feb_2015 ENGINE = InnoDB,
   SUBPARTITION mar_2015 ENGINE = InnoDB,
   SUBPARTITION apr_2015 ENGINE = InnoDB,
   SUBPARTITION may_2015 ENGINE = InnoDB,
   SUBPARTITION jun_2015 ENGINE = InnoDB,
   SUBPARTITION jul_2015 ENGINE = InnoDB,
   SUBPARTITION aug_2015 ENGINE = InnoDB,
   SUBPARTITION sep_2015 ENGINE = InnoDB,
   SUBPARTITION oct_2015 ENGINE = InnoDB,
   SUBPARTITION nov_2015 ENGINE = InnoDB),
  PARTITION p2016 VALUES LESS THAN (2017)
  (SUBPARTITION dec_2016 ENGINE = InnoDB,
   SUBPARTITION jan_2016 ENGINE = InnoDB,
   SUBPARTITION feb_2016 ENGINE = InnoDB,
   SUBPARTITION mar_2016 ENGINE = InnoDB,
   SUBPARTITION apr_2016 ENGINE = InnoDB,
   SUBPARTITION may_2016 ENGINE = InnoDB,
   SUBPARTITION jun_2016 ENGINE = InnoDB,
   SUBPARTITION jul_2016 ENGINE = InnoDB,
   SUBPARTITION aug_2016 ENGINE = InnoDB,
   SUBPARTITION sep_2016 ENGINE = InnoDB,
   SUBPARTITION oct_2016 ENGINE = InnoDB,
   SUBPARTITION nov_2016 ENGINE = InnoDB),
  PARTITION pmax VALUES LESS THAN MAXVALUE
  (SUBPARTITION dec_max ENGINE = InnoDB,
   SUBPARTITION jan_max ENGINE = InnoDB,
   SUBPARTITION feb_max ENGINE = InnoDB,
   SUBPARTITION mar_max ENGINE = InnoDB,
   SUBPARTITION apr_max ENGINE = InnoDB,
   SUBPARTITION may_max ENGINE = InnoDB,
   SUBPARTITION jun_max ENGINE = InnoDB,
   SUBPARTITION jul_max ENGINE = InnoDB,
   SUBPARTITION aug_max ENGINE = InnoDB,
   SUBPARTITION sep_max ENGINE = InnoDB,
   SUBPARTITION oct_max ENGINE = InnoDB,
   SUBPARTITION nov_max ENGINE = InnoDB)) */ |

m1, m2, and m5 are set as index in this table, unique/primary are not applicable in my case. 在此表中将m1,m2和m5设置为索引,在我的情况下,唯一/主变量不适用。

As the data is getting bigger (100,000 new row a day), the update command is getting very slow. 随着数据变得越来越大(每天增加100,000个新行),update命令变得越来越慢。

I would like to know if there are any ways to improve the following statement. 我想知道是否有任何方法可以改善以下陈述。

update mData as a join (select * from mData
                        where m1 = 326 and m5 = '2015-   07-06' ) as b
            on  a.m5 > b.m5 and a.m1 = b.m1
            and a.m2 = b.m2 and a.m3 = b.m3
    set a.m4 = 0;

I am quite sure that in select statement, if I replace mData as a to (select * from mData where m1 = 326) , the executive time could largely reduce (from 5 sec to less than 1 sec). 我非常确定,在select语句中,如果我将mData as a替换mData as a to (select * from mData where m1 = 326) ,执行时间将大大减少(从5秒减少到不到1秒)。

However, it is not possible to do the same in UPDATE statement. 但是,不可能在UPDATE语句中执行相同的操作。

Is there any solution for this, to speed up update? 有什么解决方案可以加快更新速度吗?

PS the table has been partitioned by month(m5) and year(m5) PS该表已按月(m5)和年(m5)进行了分区

Here is the EXPLAIN partitions for my join query, very messy, hope you don't mind. 这是我的联接查询的EXPLAIN分区,非常混乱,希望您不要介意。 Adding ' and a.m5 > '2015-07-06' does improve the perfomance, query time drops from 0.68 sec to 0.2 sec. 添加'和a.m5>'2015-07-06'确实可以提高性能,查询时间从0.68秒降至0.2秒。

explain partitions (select * from (select * from mData where m1 = 326) as a join (select * from mData where m1 = 326 and m5= '2015-07-06') as b on  a.m5 > b.m5 and a.m1 = b.m1 and a.m2 = b.m2 and a.m3 = b.m3 and a.m5 > '2015-07-06');

| | id | id | select_type | select_type | table | 桌子| partitions | 隔板| type | 类型 possible_keys | 可能的钥匙| key | 关键 key_len | key_len | ref | 参考| rows | 行| Extra | 额外|| ----- + ------ + ------ + ------------------------------ -+ | 1 | 1 | PRIMARY | 主要| | | NULL | NULL | ALL | 全部| NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 358 | 358 | | | | | 1 | 1 | PRIMARY | 主要| | | NULL | NULL | ALL | 全部| NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 1073 | 1073 | Using where; 在哪里使用 Using join buffer | 使用连接缓冲区 | | 3 | 3 | DERIVED | 派生| mData | mData | p2015_jul_2015 | p2015_jul_2015 | ref | 参考| m_m1,m_m5,m_combined,m1_m5 | m_m1,m_m5,m_combined,m1_m5 | m1_m5 | m1_m5 | 8 | 8 | | | 357 | 357 | Using where | 在哪里使用 | | 2 | 2 | DERIVED | 派生| mData | mData | p2013_dec_2013,p2013_jan_2013,p2013_feb_2013,p 2013_mar_2013,p2013_apr_2013,p2013_may_2013,p2013_jun_2013,p2013_jul_2013,p2013_ aug_2013,p2013_sep_2013,p2013_oct_2013,p2013_nov_2013,p2014_dec_2014,p2014_jan_2 014,p2014_feb_2014,p2014_mar_2014,p2014_apr_2014,p2014_may_2014,p2014_jun_2014,p 2014_jul_2014,p2014_aug_2014,p2014_sep_2014,p2014_oct_2014,p2014_nov_2014,p2015_ dec_2015,p2015_jan_2015,p2015_feb_2015,p2015_mar_2015,p2015_apr_2015,p2015_may_2 015,p2015_jun_2015,p2015_jul_2015,p2015_aug_2015,p2015_sep_2015,p2015_oct_2015,p 2015_nov_2015,p2016_dec_2016,p2016_jan_2016,p2016_feb_2016,p2016_mar_2016,p2016_ apr_2016,p2016_may_2016,p2016_jun_2016,p2016_jul_2016,p2016_aug_2016,p2016_sep_2 016,p2016_oct_2016,p2016_nov_2016,pmax_dec_max,pmax_jan_max,pmax_feb_max,pmax_ma r_max,pmax_apr_max,pmax_may_max,pmax_jun_max,pmax_jul_max,pmax_aug_max,pmax_sep_ max,pmax_oct_max,pmax_nov_max | p2013_dec_2013,p2013_jan_2013,p2013_feb_2013,P 2013_mar_2013,p2013_apr_2013,p2013_may_2013,p2013_jun_2013,p2013_jul_2013,p2013_ aug_2013,p2013_sep_2013,p2013_oct_2013,p2013_nov_2013,p2014_dec_2014,p2014_jan_2 014,p2014_feb_2014,p2014_mar_2014,p2014_apr_2014,p2014_may_2014,p2014_jun_2014,P 2014_jul_2014,p2014_aug_2014,p2014_sep_2014,p2014_oct_2014, p2014_nov_2014,p2015_ dec_2015,p2015_jan_2015,p2015_feb_2015,p2015_mar_2015,p2015_apr_2015,p2015_may_2 015,p2015_jun_2015,p2015_jul_2015,p2015_aug_2015,p2015_sep_2015,p2015_oct_2015,p 2015_nov_2015,p2016_dec_2016,p2016_jan_2016,p2016_feb_2016,p2016_mar_2016,p2016_ apr_2016,p2016_may_2016,p2016_jun_2016,p2016_jul_2016,p2016_aug_2016,p2016_sep_2 016 ,p2016_oct_2016,p2016_nov_2016,pmax_dec_max,pmax_jan_max,pmax_feb_max,pmax_ma r_max,pmax_apr_max,pmax_may_max,pmax_jun_max,pmax_jul_max,pmax_aug_max,pmax_sep_ max,pmax_nov_ | ref | 参考| m_m1,m_combined,m1_m5 | m_m1,m_combined,m1_m5 | m_m1 | m_m1 | 4 | 4 | | | 1074 | 1074 | Using where | 在哪里使用

Below is the query explain asked by "Rick James" 以下是“ Rick James”询问的查询解释

EXPLAIN PARTITIONS select * from ccass_data where sid = 326 and trade_day = '2015-07-06';

| id | select_type | table      | partitions     | type | possible_keys                                    | key          | key_len | ref         | rows | Extra       |
 +----+-------------+------------+----------------+------+--------------------------------------------------+--------------+---------+-------------+------+-------------+
 |  1 | SIMPLE      | mData     | p2015_jul_2015 | ref  | m_m1,m_m5,m_combined,m1_m5               | m1_m5 | 8    | const,const |  357    | Using where        |

First I would use the fixed value for m5 in order to limit the partitions to consider. 首先,我将使用m5的固定值来限制要考虑的分区。 Maybe you should add also a dummy condition on year(m5) and month(m5). 也许您还应该在year(m5)和month(m5)上添加一个虚拟条件。 Then I would create a temp table for the subquery and an index on m2 and m3. 然后,我将为子查询创建一个临时表,并在m2和m3上创建一个索引。 Then I'd use the fixed values for m1 and m5. 然后,我将对m1和m5使用固定值。 But how many times the query is executed? 但是查询执行了多少次? 5 secs is not a terrible result. 5秒不是一个可怕的结果。

For starters, add INDEX(m1, m5) . 对于初学者,添加INDEX(m1, m5) After I see SHOW CREATE TABLE mData; 在看到SHOW CREATE TABLE mData; , I may have other recommendations. ,我可能还有其他建议。

EDIT 编辑

Adding AND a.m5 > '2015-07-06' may get partition pruning to kick in. I don't have any experience with UPDATE and SUBPARTITION to predict. 添加AND a.m5 > '2015-07-06' 可能会启动分区修剪。我没有使用UPDATESUBPARTITION进行预测的经验。

InnoDB must have a PRIMARY KEY . InnoDB 必须具有一个PRIMARY KEY Would (m1, m2, m3, m5) work as a PK? (m1, m2, m3, m5)可以用作PK?

USING HASH is ignored, since InnoDB does not implement it. 因为InnoDB没有实现,所以将忽略USING HASH It will be a BTree, which is nearly as good, anyway. 无论如何,它将是一个几乎一样的BTree。

KEY `m_m1` (`m1`)

is redundant and can be dropped, since there is another (actually two) index that starts with it. 是多余的,可以删除,因为有另一个(实际上是两个)索引开始

Can't you do a JOIN instead of using a subquery? 您不能执行JOIN而不是使用子查询吗? (That would avoid a tmp table.) (这样可以避免使用tmp表。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM