简体   繁体   English

Mysql Innodb性能 - 如何最小化多列索引?

[英]Mysql Innodb Performance - How to minimise multicolumn index?

Below table contains 10 million of rows, 下表包含1000万行,

CREATE TABLE Sample1 (
  c1 bigint(20) NOT NULL AUTO_INCREMENT,
  c2 varchar(45) NOT NULL,
  c3 tinyint(4) NOT NULL DEFAULT 0,
  c4 tinyint(4) NOT NULL DEFAULT 0,
  c5 varchar(45) DEFAULT NULL,
  time bigint(20) DEFAULT NULL,
  PRIMARY KEY (c1),
  KEY varchar_time_idx (c2,Time),
  KEY varchar_c3_time_idx (c2,c3,Time),
  KEY varchar_c4_time_idx (c2,c4,Time),
  KEY varchar_c3_c4_time_idx (c2,c3, c4,Time)
) ENGINE=InnoDB AUTO_INCREMENT=10093495 DEFAULT CHARSET=utf8;

Select 选择
Four multi column index created to select rows with below conditions in where 创建四个多列索引以选择具有以下条件的行

1) c2 and time 1) c2和时间
ex: select c1, c5 from Sample1 where c2 = 'sometext' order by time limit 30; 例如:从Sample1中选择c1,c5,其中c2 ='sometext'顺序为时间限制30;

2) c2 and c3 and time 2) c2和c3以及时间
ex: select c1, c5 from Sample1 where c2 = 'sometext' and c3 = int order by time limit 30; 例如:从Sample1中选择c1,c5,其中c2 ='sometext',c3 = int order by time limit 30;

3) c2 and c4 and time 3) c2和c4和时间
ex: select c1, c5 from Sample1 where c2 = 'sometext' and c4 = int order by time limit 30; 例如:从Sample1中选择c1,c5,其中c2 ='sometext',c4 = int order by time limit 30;

4) c2 and c3 and c4 and time 4) c2和c3和c4以及时间
ex: select c1, c5 from Sample1 where c2 = 'sometext' and c3 = int and c4 = int order by time limit 30; 例如:从Sample1中选择c1,c5,其中c2 ='sometext',c3 = int ,c4 = int order by time limit 30;

To make above select faster, created four multi column index. 为了使上面选择更快,创建了四个多列索引。

Cardinality wise c2, c3 and c4 are very low. 基数明智的c2,c3和c4非常低。 (ex: Out of one million c2, c3 and c4 have 100 unique column in each). (例如:在一百万个c2中,c3和c4各有100个独特的列)。

Also not equally distributed. 也没有平均分配。 Each group in c2 have uneven number of rows. c2中的每个组具有不均匀的行数。 (ex: c2 = 1 contains 100000, c2 = 2 contains 1500000 and so on) (例如:c2 = 1包含100000,c2 = 2包含1500000,依此类推)

Column time(timestamp in millisecond) contains mostly unique fields. 列时间(以毫秒为单位的时间戳)主要包含唯一字段。

Select happen normally(10 to 30 times in a hour but it should be in high speed) 选择正常发生(一小时10到30次,但应该是高速)

Insert 插入
Insert happen very frequently. 插入频繁发生。
But it processes in Sequently (one after another). 但它在后续处理(一个接一个)。

Update 更新
All update based on C1 (Primary Key). 所有更新都基于C1(主键)。 (Frequency Level : 20% on Insert) (频率等级:插入时为20%)
update Sample1 set c3 = INT , c4 = INT , time = CurrentTimeInMilliSecond where c1 = INT update Sample1 set c3 = INT ,c4 = INT ,time = CurrentTimeInMilliSecond其中c1 = INT

Tables has 5 indexing fields(4 multi column). 表有5个索引字段(4个多列)。 Due to this 由于这个
1) Insert and update on index fields become costlier 1)索引字段的插入和更新变得更加昂贵
2) As the table keep on growing (it may reach upto 100 million), Index size also increase more rapidly 2)随着表继续增长(可能达到1亿),指数规模也会迅速增加

Kindly suggest good approach in mysql to solve this use case. 请在mysql中建议好的方法来解决这个用例。

Other Necessary Details 其他必要的细节
innodb_buffer_pool_size:16106127360(15 GB); innodb_buffer_pool_size:16106127360(15 GB);
CPU Core:32; CPU核心:32;
RAM:32GB 内存:32GB

Caution: TMI coming. 警告: TMI即将到来。 I'm having to do some guessing; 我不得不做一些猜测; I can be more specific if you supply more details... 如果您提供更多详细信息,我可以更具体...

The 4 secondary keys you have are optimal for the 4 queries you listed. 您拥有的4个辅助键最适合您列出的4个查询。

Cardinality, contrary to a popular wives' tale, has nothing to do with composite indexes and SELECT performance. 与一个流行的妻子的故事相反,基数与复合索引和SELECT性能无关。

At 100M rows, the table (including indexes) will perhaps be 20GB. 在100M行,表(包括索引)可能是20GB。 How much RAM do you have? 你有多少RAM? What is the value of innodb_buffer_pool_size ? innodb_buffer_pool_size的价值是innodb_buffer_pool_size Unless you have a tiny RAM, these probably won't matter. 除非你有一个微小的内存,这些可能无关紧要。

Back to 'cardinality'. 回到'基数'。

Let's look at INDEX(c2, Time) where there are 100 distinct values for c2 and Time is essentially ever-increasing. 让我们来看看INDEX(c2, Time) ,其中有100倍不同的值c2Time基本上是不断增加的。 Each new INSERT will put the new row in one of 100 spots -- the ends of each c2 clump. 每个新INSERT都会将新行放在100个点之一 - 每个c2丛的末端。 This implies 100 "hot spots", and it implies 100 blocks is (mostly) sufficient to take care of updating this one index. 这意味着100个“热点”,它意味着100个块(大部分)足以照顾更新这个索引。 100 blocks = 1.6MB of the buffer_pool -- hopefully a tiny fraction. 100块=缓冲池的1.6MB - 希望是一小部分。

Meanwhile, the PRIMARY KEY is AUTO_INCREMENT , so there is one hot spot, and one block -- even tinier fraction. 同时, PRIMARY KEYAUTO_INCREMENT ,因此有一个热点,一个块 - 甚至更小的分​​数。

But... The other 3 secondary keys will have more hot spots (blocks), so they could be more important. 但是......其他3个辅助键将有更多热点(块),因此它们可能更重要。 Let's go to the worst one (c2, c3, c4, Time) . 让我们去最糟糕的一个(c2, c3, c4, Time) Tentatively, that would have 100*100*100 hot spots. 暂时,这将有100 * 100 * 100的热点。 But I think that will be more than there will be blocks in the entire index. 但我认为这将比整个索引中的块更多。 (So, the math falls apart.) So that will be rather busy. (所以,数学分崩离析。)所以这将是相当繁忙的。

A digression for a moment... How many rows do you INSERT in a transaction? 离题了......你在一次交易中INSERT了多少行? How many rows/second? 多少行/秒? What is the value of innodb_flush_log_at_trx_commit (flatc)? innodb_flush_log_at_trx_commit (flatc)的价值是多少? Well, let's simplify it down to either one row fully flushed at a time versus lots of rows flushed in a batch. 好吧,让我们将其简化为一次完全刷新的一行与批量刷新的大量行。

Back to the computations... 回到计算......

At one extreme: Small buffer_pool and single-row transactions and flatc=1 and HDD: you will need a few IOPs. 在一个极端:小buffer_pool 单行事务 flatc = 1 HDD:你将需要一些IOP。 I hope you don't need to insert more than 20 rows/second. 我希望你不需要插入超过20行/秒。

At the other extreme: Large buffer pool and batching and flatc=2 and SSD: Average of less than 1 IOPs. 在另一个极端:大缓冲池批处理 flatc = 2 SSD:平均小于1 IOP。 You can probably handle more than 1000 rows being inserted per second. 您每秒可以处理超过1000行。

Normalizing c2 might cut in half the 20GB estimate, thereby making multiple tweaks in the computations. 归一化c2可能会减少20GB估计值的一半,从而在计算中进行多次调整。

Back to the SELECTs -- are you really fetching 100K rows for a given c2 ? 回到SELECTs - 你真的为给定的c2获取100K行吗? If you have more filtering, ORDERing , LIMITing , etc, please show them; 如果您有更多过滤, ORDERingLIMITing等,请出示; it could make a big difference in this analysis. 它可以在这个分析中产生很大的不同。

Back to the title -- I don't yet see any useful way to change/minimize those indexes. 回到标题 - 我还没有看到任何改变/最小化这些索引的有用方法。 They seem to be very useful for the SELECTs , and only minimally harmful to the INSERTs . 它们似乎对SELECTs 非常有用,并且对INSERTs危害最小

Oh, the UPDATEs . 哦,对UPDATEs We need to see the WHERE clause on the UPDATEs before consider the ramifications there. 在考虑那里的后果之前,我们需要在UPDATEs上看到WHERE子句。

More (After several updates to question) 更多 (经过多次更新后提问)

PRIMARY KEY(c1) takes care of making the UPDATEs as fast as possible (aside from the need to eventually update the indexes). PRIMARY KEY(c1)负责尽可能快地进行UPDATEs (除了最终更新索引的需要)。

SELECTs are very infrequent; SELECTs很少见; my indexes make each run as fast as 'possible' 我的索引使每个运行速度与'可能'一样快

Buffer_pool of 15GB says that the entire table and all its indexes will live in the pool (once it is warmed up) -- for the current 10M rows. 15GB的Buffer_pool表示整个表及其所有索引都将存在于池中(一旦它被预热) - 对于当前的10M行。 At 100M rows, it may still be OK. 在100M行,它可能仍然没问题。 I say this because the queries that are likely to cause churn are the SELECTs , but they all say AND Time > ... . 我这样说是因为可能导致流失的查询是SELECTs ,但他们都说AND Time > ... This implies a "working set" that is the "end" of the table. 这意味着“工作集”是表的“结束”。 If you get to a billion rows, this paragraph needs revisiting. 如果你达到十亿行,那么这一段需要重新审视。

MySQL should be able to handle a million INSERTs per day, even with the worst settings. MySQL应该能够每天处理一百万个INSERTs ,即使设置最差。 So if you are not expecting to get your 100M rows faster than 3 months, I don't think the INSERTs are a problem. 因此,如果您不希望超过3个月获得100M行,我认为INSERTs不是问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM