简体   繁体   English

这是索引MySQL数据库的TEXT列的正确方法吗?

[英]Is it a correct way to index TEXT column of MySQL database?

I have a map from strings to integers. 我有一个从字符串到整数的映射。 To store this map in a MySQL database I created the following table: 要将此映射存储在MySQL数据库中,我创建了下表:

CREATE TABLE map(
  Argument TEXT NOT NULL,
  Image INTEGER NOT NULL
)

I chose the TEXT type for the argument because its length is unpredictable, currently the longest record has 2290 chars and the average length is 88 chars. 我为参数选择了TEXT类型,因为它的长度是不可预测的,目前最长的记录有2290个字符,平均长度是88个字符。

After I'd met the performance troubles I tried to add index on Argument column, but found that I must to specify length, so to avoid this limitation I added a new integer column containing hash values (md5 or else) of Argument column values. 在我遇到性能问题之后,我试图在Argument列上添加索引,但发现我必须指定长度,所以为了避免这种限制,我添加了一个新的整数列,其中包含参数列值的哈希值(md5或其他)。

ALTER TABLE map ADD COLUMN ArgumentHash INTEGER;

And combined index 和综合指数

CREATE INDEX argument_index USING HASH ON map(ArgumentHash, Argument(80));

Since that time the problems with performance has disappeared. 从那时起,性能问题就消失了。 I'd like to ask whether it is a correct way to solve this problem. 我想问一下解决这个问题是否正确。

I don't think there is a "correct" way, it depends what you are using the column for. 我不认为有一种“正确”的方式,这取决于你使用的是什么。

In my experience, it is unusual to have to/want to select on a large text column; 根据我的经验,不得不/想要选择大型文本列; the text is usually data retrieved by some other key (unless indexed in some other way - egs. full text, Lucene - but that doesn't appear to be what you are doing) 文本通常是由其他一些键检索的数据(除非以其他方式索引 - 例如全文,Lucene - 但这似乎不是你在做什么)

If you do in fact need an exact match on a large field, then it may be more efficient to use the hash as it will likely let you keep the index smaller. 如果你确实需要在大字段上进行精确匹配,那么使用散列可能会更有效,因为它可能会让你保持索引更小。 My guess is that if you need to use an index size larger than the size of the hash (depends on how close to the start of the TEXT the values generally differ), use the hash. 我的猜测是,如果您需要使用大于散列大小的索引大小(取决于TEXT开头与值的差异程度通常不同),请使用散列。

Your best bet is to try it and see. 你最好的选择是试试看。 Profile both approaches with representative data and find out. 用代表性数据描述两种方法并找出答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM