[英]MySQL indexing columns vs joining tables
I am trying to figure out the most efficient way to extract values from database that has the structure similar to this: 我试图找出从具有类似结构的数据库中提取值的最有效方法:
table test: 表测试:
int id (primary, auto increment)
varchar(50) stuff,
varchar(50) important_stuff;
where I need to do a query like 我需要像这样的查询
select * from test where important_stuff like 'prefix%';
The size of the entire table is approximately 10 million rows, however there are only about 500-1000 distinct values for important_stuff. 整个表的大小约为1000万行,但是重要数据只有大约500-1000个不同的值。 My current solution is indexing
important_stuff
however the performance is not satisfactory. 我目前的解决方案是索引
important_stuff
但是表现并不尽如人意。 Will it be better to create a separate table that will match distinct important_stuff
to a certain id, which will be stored in the 'test' table and then do 最好创建一个单独的表,该表将不同的
important_stuff
与某个ID匹配,该ID将存储在“测试”表中,然后执行
(select id from stuff_lookup where important_stuff like 'prefix%') a join select * from test b where b.stuff_id=a.id
or this: 或这个:
select * from test where stuff_id exists in(select id from stuff_lookup where important_stuff like 'prefix%')
What is the best way to optimize things like that? 优化这样的事情的最佳方法是什么?
I'm not MySQL user but I made some tests on my local database. 我不是MySQL用户,但是我在本地数据库上进行了一些测试。 I've added 10 millions rows as you wrote and distinct datas from third column are loaded quite fast.
在您编写时,我已经添加了1000万行,并且第三列中的不同数据加载得非常快。 These are my results.
这些是我的结果。
mysql> describe bigtable;
+-----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| stuff | varchar(50) | NO | | NULL | |
| important_stuff | varchar(50) | NO | MUL | NULL | |
+-----------------+-------------+------+-----+---------+----------------+
3 rows in set (0.03 sec)
mysql> select count(*) from bigtable;
+----------+
| count(*) |
+----------+
| 10000089 |
+----------+
1 row in set (2.87 sec)
mysql> select count(distinct important_stuff) from bigtable;
+---------------------------------+
| count(distinct important_stuff) |
+---------------------------------+
| 1000 |
+---------------------------------+
1 row in set (0.01 sec)
mysql> select distinct important_stuff from bigtable;
....
| is_987 |
| is_988 |
| is_989 |
| is_99 |
| is_990 |
| is_991 |
| is_992 |
| is_993 |
| is_994 |
| is_995 |
| is_996 |
| is_997 |
| is_998 |
| is_999 |
+-----------------+
1000 rows in set (0.15 sec)
Important information is that I refreshed statistics on this table (before this operation I needed ~10 seconds to load these data). 重要信息是,我刷新了该表的统计信息(在执行此操作之前,我需要约10秒才能加载这些数据)。
mysql> optimize table bigtable;
How big is innodb_buffer_pool_size
? innodb_buffer_pool_size
? How much RAM is available? 有多少可用RAM? The former should be about 70% of the latter.
前者应占后者的70%。 You'll see in a minute why I bring up this setting.
您稍后将看到为什么我提出此设置。
Based on your 3 suggested SELECTs, the original one will work as good as the two complex ones. 根据您建议的3个SELECT,原始的SELECT和两个复杂的SELECT一样好。 In some other case, the complex formulation might work better.
在其他情况下,复杂的公式可能会更好。
INDEX(important_stuff)
is the 'best' index for INDEX(important_stuff)
是“最佳”索引
select * from test where important_stuff like 'prefix%';
Now, let's study how that query works with that index: 现在,让我们研究该查询如何与该索引一起工作:
id
). id
)。 (Effort: <= 10 disk hits) Total Effort: ~1010 blocks (worst case). 总精力:〜1010格(最坏的情况)。
A standard spinning disk can handle ~100 reads/second. 一个标准的旋转磁盘可以处理约100次读取/秒。 So.
所以。 we are looking at 10 seconds.
我们正在看10秒。
Now, run the query again. 现在,再次运行查询。 Guess what;
你猜怎么了; all those blocks are now in RAM (cached in the "buffer_pool", which is hopefully big enough for all of them).
所有这些块现在都在RAM中(缓存在“ buffer_pool”中, 希望对所有块都足够大)。 And it runs in less than 1 second.
而且运行时间不到1秒。
OPTIMIZE TABLE
was not necessary! OPTIMIZE TABLE
是没有必要的! It was not a statistics refresh, but rather caching that sped up the query. 这不是刷新统计信息,而是缓存加快了查询速度。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.