简体   繁体   English

如何改善MySQL REGEXP搜索?

[英]how to improve MySQL REGEXP search?

Question are, 问题是

1.How can I improve the performance of SELECT queries in mysql utilizing REGEXP ? 1.如何使用REGEXP提高mysql中SELECT查询的性能?

The table looks like 桌子看起来像

create table `tweets`(
    `id` bigint auto_increment,
    `tweet` varchar(140),
    `time` datetime,
    primary key(`id`)
);

Here the following query takes about 0.35 seconds . 在这里,以下查询大约需要0.35秒

select tweet from tweets where tweet regexp '^[abcdef]{1,4}$';
  1. Will indexing tweet make it faster? 索引tweet会更快吗? If so, what type of index should I use? 如果是这样,我应该使用哪种类型的索引?
  2. My table engine is InnoDB , Is there any other table engine that will become beneficial? 我的表引擎是InnoDB ,还有其他表引擎会变得有用吗?

Your best bet is to reduce the result set to evaluate against the regular expression before evaluating. 最好的选择是在求值之前减少要针对正则表达式求值的结果集。 Regular expressions are, for all intents and purposes, impossible to index for. 就所有意图和目的而言,正则表达式都是无法索引的。

If I had to come up with a way for this, I would examine patterns that are commonly searched against, and mark them in some indexible way at insert time. 如果必须为此提出一种方法,我将检查通常针对其进行搜索的模式,并在插入时以某种可索引的方式对其进行标记。 For example if you use the ^[abcdef]{1,4}$ expression to search against a lot, I'd make a boolean column first4AThruF and on an insert/ update trigger, update the column to true or false based on whether or not it matched the regular expression. 例如,如果您使用^[abcdef]{1,4}$表达式来搜索大量内容,则我将在布尔值列first4AThruF创建一个布尔列,并在插入/更新触发器上根据是还是将列更新为true或false它不匹配正则表达式。 If I indexed the first4AThruF column, and the column had enough selectivity, I could write the query: 如果我索引了first4AThruF列,并且该列具有足够的选择性,则可以编写查询:

select tweet from tweets where first4AThruF = true;

and this should be pretty zippy. 这应该是很活泼的

Other possibilities to consider are full-text queries or LIKE clauses, although in the case mentioned above I don't expect them to work well. 其他可能要考虑的是全文查询或LIKE子句,尽管在上述情况下,我认为它们不能很好地工作。

If the search you're looking for is at the start of a string, you can use LIKE as a high-level filter then check again with REGEXP : 如果您要搜索的是字符串的开头,则可以将LIKE用作高级过滤器,然后使用REGEXP再次检查:

select tweet from tweets 
where 
    ( 
      tweet LIKE 'a%' OR
      tweet LIKE 'b%' OR
      tweet LIKE 'c%' OR
      tweet LIKE 'd%' OR
      tweet LIKE 'e%'
    )
    AND LENGTH(tweet) <= 4 -- try taking this line out line too
    AND tweet regexp '^[abcdef]{1,4}$';

In spite of being a little convoluted, this should be a lot faster. 尽管有点令人费解,但这应该要快得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM