简体   繁体   English

带有索引的非常慢的简单 MySql 查询

[英]Very Slow simple MySql query with index

i have this table :我有这张桌子:

CREATE TABLE `messenger_contacts` (
  `number` varchar(15) NOT NULL,
  `has_telegram` tinyint(1) NOT NULL DEFAULT '0',
  `geo_state` int(11) NOT NULL DEFAULT '0',
  `geo_city` int(11) NOT NULL DEFAULT '0',
  `geo_postal` int(11) NOT NULL DEFAULT '0',
  `operator` tinyint(1) NOT NULL DEFAULT '0',
  `type` tinyint(1) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE `messenger_contacts`
  ADD PRIMARY KEY (`number`),
  ADD KEY `geo_city` (`geo_city`),
  ADD KEY `geo_postal` (`geo_postal`),
  ADD KEY `type` (`type`),
  ADD KEY `type1` (`operator`),
  ADD KEY `has_telegram` (`has_telegram`),
  ADD KEY `geo_state` (`geo_state`);

with about 11 million records.大约有 1100 万条记录。

A simple count select on this table takes about 30 to 60 seconds to complete witch seems very high.这个表上的一个简单的计数选择需要大约 30 到 60 秒才能完成女巫似乎非常高。

select count(number) from messenger_contacts where geo_state=1

I am not a Database pro so beside setting indexes i don't know what else i can do to make the query faster?我不是数据库专家,所以除了设置索引我不知道我还能做些什么来使查询更快?

UPDATE:更新:

OK , i made some changes to column type and size:好的,我对列类型和大小进行了一些更改:

CREATE TABLE IF NOT EXISTS `messenger_contacts` (
  `number` bigint(13) unsigned NOT NULL,
  `has_telegram` tinyint(1) NOT NULL DEFAULT '0' ,
  `geo_state` int(2) NOT NULL DEFAULT '0',
  `geo_city` int(4) NOT NULL DEFAULT '0',
  `geo_postal` int(10) NOT NULL DEFAULT '0',
  `operator` tinyint(1) NOT NULL DEFAULT '0' ,
  `type` tinyint(1) NOT NULL DEFAULT '0' ,
  PRIMARY KEY (`number`),
  KEY `has_telegram` (`has_telegram`,`geo_state`),
  KEY `geo_city` (`geo_city`),
  KEY `geo_postal` (`geo_postal`),
  KEY `type` (`type`),
  KEY `type1` (`operator`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Now the query only takes 4 to 5 seconds with * and number现在查询只需要 4 到 5 秒*number

Tanks every one for your help, even the guy that gave me -1 .坦克每个人都需要你的帮助,即使是给我-1的那个人。 this would be good enough for now considering that my server is a low end hardware and i will be caching the select count results.现在考虑到我的服务器是低端硬件并且我将缓存select count结果,这已经足够了。

Maybe也许

select count(geo_state) from messenger_contacts where geo_state=1

as it will give the same result but will not use number column from the clustered index?因为它会给出相同的结果但不会使用聚集索引中的数字列?

If this does not help, I would try to change number column into INT type, which should reduce the index size, or try to increase amount of memory MySQL could use for caching indexes.如果这没有帮助,我会尝试将 number 列更改为 INT 类型,这应该会减小索引大小,或者尝试增加 MySQL 可用于缓存索引的内存量。

You did not change the datatypes.您没有更改数据类型。 INT(11) == INT(2) == INT(100) -- each is a 4-byte signed integer. INT(11) == INT(2) == INT(100) —— 每个都是 4 字节有符号整数。 You probably want 1-byte unsigned TINYINT UNSIGNED or 2-byte SMALLINT UNSIGNED .您可能需要 1 字节无符号TINYINT UNSIGNED或 2 字节SMALLINT UNSIGNED

It is a waste to index "flags", which I assume type and has_telegram are.索引“标志”是一种浪费,我假设typehas_telegram是。 The optimizer will never use them because it will less efficient than simply doing a table scan.优化器永远不会使用它们,因为它比简单地执行表扫描效率低。

The standard coding pattern is:标准编码模式为:

select count(*)
    from messenger_contacts
    where geo_state=1

unless you need to not count NULLs , which is what COUNT(geo_state) implies.除非你需要不计NULLs ,这是COUNT(geo_state)暗示。

Once you have the index on geo_state (or an index starting with geo_state ), the query will scan the index (which is a separate BTree structure) starting with the first occurrence of geo_state=1 until the last, counting as it goes.一旦您拥有geo_state上的索引(或以geo_state开头的索引),查询将扫描索引(这是一个单独的 BTree 结构),从第一次出现geo_state=1开始,直到最后一次,随着它的进行计数。 That is, it will touch 1.1 millions index entries.也就是说,它将触及 110 万个索引条目。 So, a few seconds is to be expected.所以,几秒钟是可以预料的。 Counting a 'rare' geo_state will run much faster.计算“稀有” geo_state会运行得更快。

The reason for 30-60 seconds versus 4-5 seconds is very likely to be caching. 30-60 秒与 4-5 秒的原因很可能是缓存。 The former had to read stuff from disk;前者必须从磁盘读取内容; the latter did not.后者没有。 Run the query twice.运行查询两次。

Using the geo_state index will be faster for that query than using the PRIMARY KEY unless there are caching differences.除非存在缓存差异,否则对该查询使用geo_state索引将比使用PRIMARY KEY更快。

INDEX(number,geo_state) is virtually useless for any of the SELECTs mentioned -- geo_state should be first. INDEX(number,geo_state)对于提到的任何SELECTs几乎都没用—— geo_state应该是第一个。 This is an example of a "covering" index for the select count(number)... case.这是select count(number)...案例的“覆盖”索引示例。

More on building indexes.有关构建索引的更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM