简体   繁体   English

在InnoDB引擎的全文搜索中使用连字符?

[英]Using a hyphen in fulltext search with an InnoDB engine?

I have a FULLTEXT search in a table of part numbers. 我在零件编号表中进行了全文搜索。 Some part numbers have hyphens. 某些部件号带有连字符。

The table engine is InnoDB using MySQL 5.6. 表引擎是使用MySQL 5.6的InnoDB。

The problem I am having is that MySQL was treating the hyphen (-) character as a word separator. 我遇到的问题是MySQL将连字符(-)视为单词分隔符。

So I created a new MySQL charset collation whereas the hyphen is treated as a letter. 因此,我创建了一个新的MySQL字符集排序规则,而连字符被视为字母。

I followed this tutorial: http://dev.mysql.com/doc/refman/5.0/en/full-text-adding-collation.html 我遵循了本教程: http : //dev.mysql.com/doc/refman/5.0/en/full-text-adding-collat​​ion.html

I made a test table, using the syntax at the bottom of the link, however i used the InnoDB Engine. 我使用链接底部的语法制作了一个测试表,但是我使用了InnoDB Engine。 I searched for '----' and received "syntax error, unexpected '-'" 我搜索'----'并收到“语法错误,意外的'-'”

However If I change the engine to MyISAM, I get the correct result. 但是,如果将引擎更改为MyISAM,则会得到正确的结果。

How to I get this to work with the InnoDB engine? 如何使它与InnoDB引擎一起使用?

It seems with MySQL its one step forward and two steps back. MySQL似乎向前迈出了一步,向后迈了两步。

Edit: I found this link for 5.6 ( http://dev.mysql.com/doc/refman/5.6/en/full-text-adding-collation.html ), which is the same tutorial using InnoDB as the engine. 编辑:我发现此链接为5.6( http://dev.mysql.com/doc/refman/5.6/en/full-text-adding-collat​​ion.html ),这是使用InnoDB作为引擎的同一教程。

But here's my test: 但是这是我的测试:

create table test (a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, FULLTEXT INDEX(a)) ENGINE=InnoDB

Added a row that is just "----" 添加了一行只是“ ----”

select * from test where MATCH(a) AGAINST('----' IN BOOLEAN MODE)

syntax error, unexpected '-' 语法错误,意外的“-”

Drop the table, MyISAM 放下桌子,MyISAM

create table test (a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, FULLTEXT INDEX(a)) ENGINE=MyISAM

Added a row that is just "----" 添加了一行只是“ ----”

select * from test where MATCH(a) AGAINST('----' IN BOOLEAN MODE)

1 result 1个结果

Edit 2, if it helps to see visually, heres my 2 tests: 编辑2,如果有助于视觉观察,这是我的2个测试:

我的ISAM

创新数据库

The InnoDb FULLTEXT search is probably treating the hyphens as stop-words. InnoDb FULLTEXT搜索可能会将连字符视为停用词。 So when it gets to the second hyphen, it would expect a word, not a hyphen. 因此,当到达第二个连字符时,它会期望一个单词,而不是连字符。 This would explain the 'syntax error'. 这将解释“语法错误”。

Why it doesn't do this in MyISAM is because the implementation in InnoDB of FULLTEXT indexes is quite different, and of course, they've only been added for InnoDB in MySQL 5.6. 为什么在MyISAM中不这样做,是因为FULLTEXT索引在InnoDB中的实现是完全不同的,当然,它们仅在MySQL 5.6中为InnoDB添加了。

What can you do about this? 您能对此做什么? It seems you can influence the list of stop-words through a special table: http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_ft_user_stopword_table . 看来您可以通过一个特殊的表来影响停用词列表: http : //dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_ft_user_stopword_table This could stop MySQL from treating hyphens as stop-words. 这可能会阻止MySQL将连字符当作停用词。

I encountered this exact issue recently. 我最近遇到了这个确切的问题。 I had previously added a custom collation per the docs and was using MyISAM and it was working fine. 我以前根据文档添加了自定义归类,并且正在使用MyISAM,并且运行良好。 Then a few weeks ago switched to InnoDB and things stopped working. 然后几周前切换到InnoDB,事情停止了。 I tried: 我试过了:

  • Rebuilding my collation and A/B testing to make sure they are working 重建我的归类和A / B测试以确保它们正常工作
  • Disabling stopword by setting innodb_ft_enable_stopword to 0 通过将innodb_ft_enable_stopword设置为0来禁用停用词
  • Rebuilding my fulltext table and index 重建全文表和索引

In the end I took a different approach since InnoDB doesn't seem to follow the same rules as MyISAM when it comes to fulltext indexing. 最后,我采用了不同的方法,因为在全文索引方面,InnoDB似乎没有遵循与MyISAM相同的规则。 This is a bit hacky but works for my application: 这有点hacky,但适用于我的应用程序:

  1. Create a special search column containing the data I need to search for. 创建一个特殊的search列,其中包含我需要搜索的数据。 This column has a fulltext index and exists for the sole purposes of doing a fulltext search, which is still very fast on a table with millions of rows. 该列具有全文索引,并且仅用于进行全文搜索而存在,在具有数百万行的表上,该列仍然非常快。
  2. Search/replace all - in my search column with an unused character that is considered a "word" character. 搜索/全部替换-在我的search栏中,使用一个未使用的字符,将其视为“单词”字符。 See my question here regarding this: https://dba.stackexchange.com/questions/248607/which-characters-are-considered-word-characters . 请参阅此处有关此问题: https : //dba.stackexchange.com/questions/248607/which-characters-are-considered-word-characters Figuring out what word characters are turns out to be not so easy but here are a few that worked for me: Ω œ π µ . 搞清楚什么字字符原来是不那么容易,但这里是为我工作的几个: Ω œ π µ These characters are probably not used in the data you need to be searching but they will be recognized by the parser as searchable characters. 这些字符可能未在您需要搜索的数据中使用,但解析器会将它们识别为可搜索的字符。 In my case I replace - with Ω . 在我来说,我代替-Ω Since I only need the row ID, it doesn't matter what the data in this column looks like to human eyes. 由于我只需要行ID,因此人眼看不见该列中的数据是什么。
  3. Revise my updates and inserts to keep the search column data and substitutions up to date. 修改我的更新和插入内容,以使search列数据和替换保持最新。 In my case this was easy since there is only one place in the application that updates this particular table. 就我而言,这很容易,因为应用程序中只有一个地方可以更新此特定表。 A couple of triggers could also be used to handle this: 也可以使用一些触发器来处理此问题:

     CREATE TRIGGER update_search BEFORE UPDATE ON animals FOR EACH ROW SET NEW.search = REPLACE(NEW.animal_name, '-', 'Ω'); CREATE TRIGGER insert_search BEFORE INSERT ON animals FOR EACH ROW SET NEW.search = REPLACE(NEW.animal_name, '-', 'Ω'); 
  4. Replace - in my search queries with Ω . 替换-在我的搜索查询Ω

Voila. Here's a fiddle demonstrating: https://www.db-fiddle.com/f/x1WZpZP6wcqbTTvTEFFXYc/0 这是一个小提琴演示: https : //www.db-fiddle.com/f/x1WZpZP6wcqbTTvTEFFXYc/0

The above workaround might not be realistic for every application but hopefully it's useful for someone. 上面的解决方法可能并不适用于每个应用程序,但希望对某些人有用。 Would be great to have a real solution to this for InnoDB. 为InnoDB拥有一个真正的解决方案将是很棒的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM