[英]Using a hyphen in fulltext search with an InnoDB engine?
I have a FULLTEXT search in a table of part numbers. 我在零件编号表中进行了全文搜索。 Some part numbers have hyphens.
某些部件号带有连字符。
The table engine is InnoDB using MySQL 5.6. 表引擎是使用MySQL 5.6的InnoDB。
The problem I am having is that MySQL was treating the hyphen (-) character as a word separator. 我遇到的问题是MySQL将连字符(-)视为单词分隔符。
So I created a new MySQL charset collation whereas the hyphen is treated as a letter. 因此,我创建了一个新的MySQL字符集排序规则,而连字符被视为字母。
I followed this tutorial: http://dev.mysql.com/doc/refman/5.0/en/full-text-adding-collation.html 我遵循了本教程: http : //dev.mysql.com/doc/refman/5.0/en/full-text-adding-collation.html
I made a test table, using the syntax at the bottom of the link, however i used the InnoDB Engine. 我使用链接底部的语法制作了一个测试表,但是我使用了InnoDB Engine。 I searched for '----' and received "syntax error, unexpected '-'"
我搜索'----'并收到“语法错误,意外的'-'”
However If I change the engine to MyISAM, I get the correct result. 但是,如果将引擎更改为MyISAM,则会得到正确的结果。
How to I get this to work with the InnoDB engine? 如何使它与InnoDB引擎一起使用?
It seems with MySQL its one step forward and two steps back. MySQL似乎向前迈出了一步,向后迈了两步。
Edit: I found this link for 5.6 ( http://dev.mysql.com/doc/refman/5.6/en/full-text-adding-collation.html ), which is the same tutorial using InnoDB as the engine. 编辑:我发现此链接为5.6( http://dev.mysql.com/doc/refman/5.6/en/full-text-adding-collation.html ),这是使用InnoDB作为引擎的同一教程。
But here's my test: 但是这是我的测试:
create table test (a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, FULLTEXT INDEX(a)) ENGINE=InnoDB
Added a row that is just "----" 添加了一行只是“ ----”
select * from test where MATCH(a) AGAINST('----' IN BOOLEAN MODE)
syntax error, unexpected '-' 语法错误,意外的“-”
Drop the table, MyISAM 放下桌子,MyISAM
create table test (a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, FULLTEXT INDEX(a)) ENGINE=MyISAM
Added a row that is just "----" 添加了一行只是“ ----”
select * from test where MATCH(a) AGAINST('----' IN BOOLEAN MODE)
1 result 1个结果
Edit 2, if it helps to see visually, heres my 2 tests: 编辑2,如果有助于视觉观察,这是我的2个测试:
The InnoDb FULLTEXT search is probably treating the hyphens as stop-words. InnoDb FULLTEXT搜索可能会将连字符视为停用词。 So when it gets to the second hyphen, it would expect a word, not a hyphen.
因此,当到达第二个连字符时,它会期望一个单词,而不是连字符。 This would explain the 'syntax error'.
这将解释“语法错误”。
Why it doesn't do this in MyISAM is because the implementation in InnoDB of FULLTEXT indexes is quite different, and of course, they've only been added for InnoDB in MySQL 5.6. 为什么在MyISAM中不这样做,是因为FULLTEXT索引在InnoDB中的实现是完全不同的,当然,它们仅在MySQL 5.6中为InnoDB添加了。
What can you do about this? 您能对此做什么? It seems you can influence the list of stop-words through a special table: http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_ft_user_stopword_table .
看来您可以通过一个特殊的表来影响停用词列表: http : //dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_ft_user_stopword_table 。 This could stop MySQL from treating hyphens as stop-words.
这可能会阻止MySQL将连字符当作停用词。
I encountered this exact issue recently. 我最近遇到了这个确切的问题。 I had previously added a custom collation per the docs and was using MyISAM and it was working fine.
我以前根据文档添加了自定义归类,并且正在使用MyISAM,并且运行良好。 Then a few weeks ago switched to InnoDB and things stopped working.
然后几周前切换到InnoDB,事情停止了。 I tried:
我试过了:
innodb_ft_enable_stopword
to 0
innodb_ft_enable_stopword
设置为0
来禁用停用词 In the end I took a different approach since InnoDB doesn't seem to follow the same rules as MyISAM when it comes to fulltext indexing. 最后,我采用了不同的方法,因为在全文索引方面,InnoDB似乎没有遵循与MyISAM相同的规则。 This is a bit hacky but works for my application:
这有点hacky,但适用于我的应用程序:
search
column containing the data I need to search for. search
列,其中包含我需要搜索的数据。 This column has a fulltext index and exists for the sole purposes of doing a fulltext search, which is still very fast on a table with millions of rows. -
in my search
column with an unused character that is considered a "word" character. -
在我的search
栏中,使用一个未使用的字符,将其视为“单词”字符。 See my question here regarding this: https://dba.stackexchange.com/questions/248607/which-characters-are-considered-word-characters . Ω
œ
π
µ
. Ω
œ
π
µ
。 These characters are probably not used in the data you need to be searching but they will be recognized by the parser as searchable characters. -
with Ω
. -
与Ω
。 Since I only need the row ID, it doesn't matter what the data in this column looks like to human eyes. Revise my updates and inserts to keep the search
column data and substitutions up to date. 修改我的更新和插入内容,以使
search
列数据和替换保持最新。 In my case this was easy since there is only one place in the application that updates this particular table. 就我而言,这很容易,因为应用程序中只有一个地方可以更新此特定表。 A couple of triggers could also be used to handle this:
也可以使用一些触发器来处理此问题:
CREATE TRIGGER update_search BEFORE UPDATE ON animals FOR EACH ROW SET NEW.search = REPLACE(NEW.animal_name, '-', 'Ω'); CREATE TRIGGER insert_search BEFORE INSERT ON animals FOR EACH ROW SET NEW.search = REPLACE(NEW.animal_name, '-', 'Ω');
Replace -
in my search queries with Ω
. 替换
-
在我的搜索查询Ω
。
Voila. 瞧 Here's a fiddle demonstrating: https://www.db-fiddle.com/f/x1WZpZP6wcqbTTvTEFFXYc/0
这是一个小提琴演示: https : //www.db-fiddle.com/f/x1WZpZP6wcqbTTvTEFFXYc/0
The above workaround might not be realistic for every application but hopefully it's useful for someone. 上面的解决方法可能并不适用于每个应用程序,但希望对某些人有用。 Would be great to have a real solution to this for InnoDB.
为InnoDB拥有一个真正的解决方案将是很棒的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.