简体繁体 English

MySQL：哈希索引与表联接

[英]MySQL: hash index vs. table join

原文 2016-11-30 06:57:50 8 2 mysql/ sql

I'm have pretty big MySQL table(more than 10 millions of rows, innoDB engine), the table has a field that indicate a row's category( varchar(40) ), the categories are less than 10. 我有一个很大的MySQL表（超过1000万行，使用innoDB引擎），该表具有一个字段，用于指示行的类别（ varchar(40) ），类别小于10。

Now I have two choices: 现在我有两个选择：

keep the field and make a hash index on it. 保留该字段并对其进行hash index 。
make the field into another category table, and link them with a category_id 将字段放入另一个category表，并将其与category_id链接

Which one has a better performance and why with these two operations: 哪一个具有更好的性能，以及为什么使用这两个操作：

Query for all categories(I know a seperated table could be faster, but does it really faster a lot? even compare to hash index ?) 查询所有类别（我知道一个单独的表可能会更快，但是真的快很多吗？甚至与hash index ？）
Query for all rows that in a specified category(I assume hash index should be faster, but not sure, cause someone told me MySQL opitimizer will make table join with small table much faster) 查询指定类别中的所有行（我认为hash index应该更快，但不确定，因为有人告诉我MySQL opitimizer将使与小表的表连接更快）

EDIT : I almost never add new categories here. 编辑：我几乎从来没有在这里添加新的类别。

2 个解决方案

You can define an index on your category column, and it will make some queries for a specific category much faster (assuming the category you search for doesn't occur in a majority of rows). 您可以在类别列上定义索引，这将使对特定类别的某些查询更快（假设您搜索的类别在大多数行中都没有出现）。 An index on a varchar works well in this way. varchar上的索引以这种方式很好地工作。

The reason you might create a lookup table for the category name is that if you want to change a category name, you can do that by changing one row in the category lookup table, instead of potentially many thousands of rows in the main table. 您可能为类别名称创建查找表的原因是，如果要更改类别名称，可以通过更改类别查找表中的一行而不是在主表中潜在地增加数千行来做到这一点。

By the way, your use of the phrase "hash index" is misplaced. 顺便说一句，您对短语“哈希索引”的使用是错误的。 InnoDB does not support hash indexes, only B-tree indexes and fulltext indexes. InnoDB不支持哈希索引，仅支持B树索引和全文索引。

Considering that for any DB it is faster to check a number (integer) than a string. 考虑到对于任何数据库，检查数字（整数）要比字符串检查更快。 I believe that the fastest result will be received if you create a X-REF table as you mentioned which converts the strings into a number which is the ID of the big table records, and have this field set as an index. 我相信，如果您创建一个X-REF表，将收到最快的结果，该表将字符串转换为数字，即大表记录的ID，并将此字段设置为索引。

As stated, you will gain performance by assisting your DB to compare 10M numbers instead of 10M strings. 如前所述，通过协助数据库比较1000万个数字而不是1000万个字符串，您将获得性能。

Also, as Bill Karwin suggests, this will allow you to change/add categories in the most flexible way. 而且，正如Bill Karwin所建议的那样，这将允许您以最灵活的方式更改/添加类别。

Last, if you don't expect the total number of categories to grow above, say, 2000, you may even make the index field of the big table to be just a two-bytes integer. 最后，如果您不希望类别的总数超过2000，例如，甚至可以使大表的索引字段仅为2个字节的整数。