简体   繁体   English

Mysql 全文搜索索引 - Match() / Against() 对不同的结果给予相同的优先级

[英]Mysql Fulltext Search index - Match() / Against() gives same priority to differing results

I'm using Fulltext Indices to identify similar column content.我正在使用全文索引来识别相似的列内容。 I noticed that the match scoring is not quite as I'm expecting.我注意到比赛得分并不像我预期的那样。

In my table I store names of videogames.在我的表中,我存储了视频游戏的名称。 When I'm searching for "Resident Evil 5", all Resident Evil games will get the same score.当我搜索“生化危机 5”时,所有的生化危机游戏都会得到相同的分数。

select id, name, 
    MATCH(name) AGAINST('Resident Evil 5' IN BOOLEAN MODE) AS score 
FROM game 
ORDER BY score DESC 

Output:输出:

7   Resident Evil Revelations 2     1.7317759990692139
36  Resident Evil Remastered    1.7317759990692139
39  Resident Evil 5     1.7317759990692139
2   The Evil Within     0.7758325934410095

In my case Resident Evil 5 should have the highest score, but it will just set the same score for all games containing the word "Resident Evil".在我的情况下,生化危机 5 应该有最高分,但它只会为所有包含“生化危机”一词的游戏设置相同的分数。 Is there any way to improve the scoring?有什么办法可以提高分数吗? I don't want to exclude the other Resident Evil games from the list, but give a higher score to Resident Evil 5.我不想从列表中排除其他生化危机游戏,而是给生化危机 5 更高的分数。

The number 5 is not participating in the match, probably because it's shorter than ft_min_word_len ;数字 5 没有参加比赛,可能是因为它比ft_min_word_len短; confirm that with确认与

show variables like 'ft%';

If I spell out the full word "Five", this gives me something longer than ft_min_word_len , and the query works as I think you expected it to.如果我拼出完整的单词“Five”,这会给我比ft_min_word_len更长的时间,并且查询会按照我认为的那样工作。 See this SQL Fiddle for an example.有关示例,请参阅SQL Fiddle。

First of all, look into the fulltext settings of your mysql server:首先,查看您的mysql服务器的全文设置:

> SHOW VARIABLES LIKE 'ft%';

The output might look something like this:输出可能如下所示:

Variable_name             Value           
------------------------  ----------------
ft_boolean_syntax         + -><()~*:""&|  
ft_max_word_len           84              
ft_min_word_len           4               
ft_query_expansion_limit  20              
ft_stopword_file          (built-in) 

You look for ft_min_word_len .你寻找ft_min_word_len As in this example, the default value is 4 .在本例中,默认值为4

To change that, if you want one-character words (like your number) to be searchable, you can set this variable by putting the following lines in your option file (usually my.ini ): 要更改这一点,如果您希望可以搜索单个字符的单词(如您的号码),您可以通过将以下几行放在您的选项文件(通常是my.ini )中来设置此变量:

[mysqld]
ft_min_word_len=1

Then restart the server and rebuild your FULLTEXT indexes:然后重新启动服务器并重建FULLTEXT索引:

REPAIR TABLE YourTable QUICK;

Keep in mind this will increase your fulltext index quite significant.请记住,这会显着增加您的全文索引。

This answer is based on the assumption, you're using MyISAM as table engine.此答案基于假设,您使用 MyISAM 作为表引擎。 If you're using InnoDB, the keyword is innodb_ft_min_token_size .如果您使用 InnoDB,则关键字是innodb_ft_min_token_size

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM