简体   繁体   English

如何改进已经优化的查询需要18秒?

[英]How to improve an already optimized query that takes 18 seconds?

So I have a vps with 512mb ram, and a MySQL table like this: 所以我有一个512mb内存的vps,以及一个像这样的MySQL表:

CREATE TABLE `table1` (
  `id` int(20) unsigned NOT NULL auto_increment,
  `ts` timestamp NOT NULL default CURRENT_TIMESTAMP,
  `value1` char(31) collate utf8_unicode_ci default NULL,
  `value2` varchar(100) collate utf8_unicode_ci default NULL,
  `value3` varchar(100) collate utf8_unicode_ci default NULL,
  `value4` mediumtext collate utf8_unicode_ci,
  `type` varchar(30) collate utf8_unicode_ci NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `type` (`type`),
  KEY `date` (`ts`)
) ENGINE=MyISAM AUTO_INCREMENT=469692 DEFAULT CHARSET=utf8
  COLLATE=utf8_unicode_ci

If I execute a query like this, it takes 2~18 seconds to complete: 如果我执行这样的查询,则需要2~18秒才能完成:

SELECT `id`, `ts`, `value1`, `value2`, `value3` FROM table1 WHERE
`type` = 'something' ORDER BY `id` DESC limit 0,10; 

EXPLAIN SELECT tells me: EXPLAIN SELECT告诉我:

  select_type: SIMPLE
         type: ref
possible_keys: type
          key: type
      key_len: 92
          ref: const
         rows: 7291
        Extra: Using where; Using filesort

I thought the 'using filesort' might be the problem, but it turns out that's not the case. 我认为'使用filesort'可能是问题,但结果并非如此。 If I remove the ORDER BY and the LIMIT, the query speed is the same (I turn off query cache for the testing with SET @@query_cache_type=0; ). 如果我删除ORDER BY和LIMIT,查询速度是相同的(我使用SET @@query_cache_type=0;来关闭测试的查询缓存SET @@query_cache_type=0; )。

mysql> EXPLAIN SELECT `id`,`ts`,`value1`,`value2`, `value3` 
       FROM table1 WHERE `type` = 'something'\G

  select_type: SIMPLE
         type: ref
possible_keys: type
          key: type
      key_len: 92
          ref: const
         rows: 7291
        Extra: Using where

Don't know if it matters but the rows approximation is inaccurate: 不知道它是否重要但行近似是不准确的:

SELECT COUNT(*) FROM table1 WHERE `type` = 'something';

Returns 22.8k rows. 返回22.8k行。

The query seems already optimized, I don't know how I could further improve it. 查询似乎已经优化,我不知道如何进一步改进它。 The whole table contains 370k rows, and is about 4.6 GiB in size. 整个表包含370k行,大小约为4.6 GiB。 Could it be possible that because the type is randomly changing row by row (randomly distributed in the whole table), it takes 2~18 seconds just to fetch the data from disk? 是否有可能因为类型是逐行随机变化(随机分布在整个表中),从磁盘获取数据需要2~18秒?

The funny thing is when I use a type that only has a few hundred rows, those queries are slow too. 有趣的是,当我使用只有几百行的类型时,这些查询也很慢。 MySQL returns rows at about 100 rows/sec! MySQL以大约100行/秒的速度返回行!

|-------+------+-----------|
| count | time |   row/sec |
|-------+------+-----------|
| 22802 | 18.7 | 1219.3583 |
|    11 |  0.1 |      110. |
|   491 |  4.8 | 102.29167 |
|   705 |  5.6 | 125.89286 |
|   317 |  2.6 | 121.92308 |
|-------+------+-----------|

Why is it so slow? 为什么这么慢? Can I further optimize the query? 我可以进一步优化查询吗? Should I move the data to smaller tables? 我应该将数据移动到较小的表吗?

I thought automatic partitioning would be a good idea, to make a new partition for every type dynamically. 我认为自动分区是一个好主意,为每个类型动态创建一个新分区。 That is not possible, for many reasons including that the maximum partition number is 1024, and there can be any types. 这是不可能的,原因很多,包括最大分区数是1024,并且可以有任何类型。 I could also try application level partitioning, creating a new table for every new type. 我还可以尝试应用程序级别分区,为每种新类型创建一个新表。 I wouldn't want to do that as it introduces great complexity. 我不想这样做,因为它引入了极大的复杂性。 I don't know how I could have a unique id for all rows in all tables. 我不知道如何为所有表中的所有行设置唯一ID。 Also, if I reach multiple inserts/second, performance would drop significantly. 此外,如果我达到多次插入/秒,性能将显着下降。

Thanks in advance. 提前致谢。

You need a multi-column index for that query: 您需要该查询的多列索引:

KEY `typeid` (`type`, `id`)

Unfortunately, as you stated, it is also slow without the ORDER so it's slow because the records are scattered around on the disk and it has to do a lot of seeks. 不幸的是,正如你所说,没有ORDER它也很慢所以它很慢因为记录分散在磁盘上并且它必须进行大量的搜索。 Once cached, it should be quite fast (Note: 22.8/370 * 4.6G = 283M, so if you do other activities/queries those record won't be in the memory for long time or might not even fit.). 一旦缓存,它应该非常快(注意:22.8 / 370 * 4.6G = 283M,所以如果你做其他活动/查询,那些记录将不会长时间存在于内存中,甚至可能不适合。)。

Do an iostat 1 to verify the I/O bottleneck. 做一个iostat 1来验证I / O瓶颈。 Loads of RAM could solve your problem. 大量的RAM可以解决您的问题。 An SSD could also solve your problem. SSD也可以解决您的问题。 But RAM is cheaper ;) 但RAM更便宜;)

If you are desperate about optimizing you can try to re-arrange your table. 如果您对优化感到绝望,可以尝试重新安排您的餐桌。 First off you select and order every row from a type and rewrite it to a new table and add the other types to that table one-by-one. 首先,您从类型中选择并排序每一行,然后将其重写为新表,并将其他类型逐个添加到该表中。 I suggest a kind of table defragmentation but I don't have any experience with this. 我建议进行一种表碎片整理,但我对此没有任何经验。

There are many ways to improve a query. 有许多方法可以改进查询。 In your case, I see that your index must be kind of huge because of the indexed Unicode VARCHAR(30) column responsible for key_len: 92 . 在你的情况下,我看到你的索引必须是巨大的,因为负责key_len: 92的索引的Unicode VARCHAR(30)列key_len: 92 Here's what you can try: replace the big VARCHAR index with something much smaller. 这是你可以尝试的:用更小的东西替换大的VARCHAR索引。 Keep the type column but remove the index and create a new indexed column typeidx which you can create as a INT UNSIGNED (or SMALLINT if possible). 保留type列但删除索引并创建一个新的索引列typeidx ,您可以将其创建为INT UNSIGNED(如果可能,还可以创建SMALLINT)。

Create a table similar to this: 创建一个类似于这样的表:

CREATE TABLE `typetable` (
  `typeidx` INT UNSIGNED NOT NULL auto_increment,
  `type` varchar(30) collate utf8_unicode_ci NOT NULL,
  PRIMARY KEY  (`typeidx`),
  UNIQUE KEY `type` (`type`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Which you fill with the existing types 您填写现有类型

INSERT INTO typetable (type) SELECT DISTINCT type FROM table1;

Then you have to update table1.typeidx with something like 然后你必须用类似的东西更新table1.typeidx

UPDATE table1 t1 JOIN typetable tt USING (type)
   SET t1.typeidx = tt.typeidx

Now your old query can become something like that 现在,您的旧查询可以变成类似的东西

SELECT `id`,`ts`,`value1`,`value2`, `value3` 
   FROM table1 WHERE `typeidx` = (SELECT typeidx FROM typetable WHERE type = 'something')

Of course you'll also have to maintain typetable and insert new values from type as they are created. 当然,您还必须维护typetable并在创建时从type插入新值。

I have no better idea than to implement vertical partitioning. 我没有比实现垂直分区更好的主意。 I made an identical table without the mediumtext column, copied the whole table without this column, and the 18 sec query takes only 100ms now! 我创建了一个没有mediumtext列的相同表,没有这个列就复制了整个表,18秒查询现在只需要100ms! The new table is only 55mb. 新表只有55mb。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM