[英]Which index should I use on binary datatype column mysql
I am writing a simple tool to check duplicate files(ie files having same data). 我正在编写一个简单的工具来检查重复的文件(即具有相同数据的文件)。 The mechanism is to generate hashes for each file using sha-512 algorithm and then store these hashes in MYSQL database.
机制是使用sha-512算法为每个文件生成哈希,然后将这些哈希存储在MYSQL数据库中。 I store hashes in binary(64) unique not null column.
我将哈希存储在binary(64)唯一的非null列中。 Each row will have a unique binary hash and used to check file is duplicate or not.
每行都有一个唯一的二进制哈希,用于检查文件是否重复。
-- My questions are -- -我的问题是-
Can I use indexes on binary column, my default table collation is latin1 - default collation? 我可以在二进制列上使用索引吗,我的默认表排序规则是latin1-默认排序规则?
Which Indexing mechanism should I use Btree or Hash, for getting high performance? 我应该使用Btree或Hash哪种索引机制来获得高性能? I need to update or add 100 of rows per seconds.
我需要每秒更新或添加100行。
What other things should I take care of to get best performance? 为了获得最佳性能,我还应该注意哪些其他事项?
Can I use indexes on binary column, my default table collation is latin1 - default collation?
我可以在二进制列上使用索引吗,我的默认表排序规则是latin1-默认排序规则?
Yes, you can; 是的你可以; collation is only relevant for character datatypes, not binary datatypes (it defines how characters should be ordered)—also, be aware that
latin1
is a character encoding , not a collation. 排序规则仅与字符数据类型相关,而与二进制数据类型无关(它定义了字符的排序方式)—另外,请注意
latin1
是字符编码 ,而不是排序规则。
Which Indexing mechanism should I use Btree or Hash, for getting high performance?
我应该使用Btree或Hash哪种索引机制来获得高性能? I need to update or add 100 of rows per seconds.
我需要每秒更新或添加100行。
Note that hash indexes are only available with the MEMORY
and NDB
storage engines, so you may not even have a choice. 请注意,哈希索引仅可用于
MEMORY
和NDB
存储引擎,因此您甚至别无选择。
In any event, either would typically be able to meet your performance criteria—although for this particular application I see no benefit from using B-Tree (which is ordered), whereas Hash would give better performance. 无论如何,它们通常都能够满足您的性能标准-尽管对于该特定应用程序,我认为使用B-Tree(已订购)没有任何好处,而哈希可以提供更好的性能。 Therefore, if you have the choice, you may as well use Hash.
因此,如果您选择的话,也可以使用Hash。
See Comparison of B-Tree and Hash Indexes for more information. 有关更多信息,请参见B树和哈希索引比较。
What other things should I take care of to get best performance?
为了获得最佳性能,我还应该注意哪些其他事项?
Depends on your definition of "best performance" and your environment. 取决于您对“最佳性能”的定义和您的环境。 In general, remember Knuth's maxim " premature optimisation is the root of all evil ": that is, only optimise when you know that there will be a problem with the simplest approach.
通常,请记住Knuth的格言“ 过早的优化是万恶之源 ”:也就是说,只有在您知道最简单的方法会有问题时才进行优化。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.