简体   繁体   English

MySQL:使用MySQL相关性搜索的特殊搜索算法

[英]MySQL: Special Search algorithm using MySQL relevance search

I'm trying to do a search in MySQL where the user just has one field. 我正在尝试在MySQL中进行搜索,其中用户只有一个字段。 The table looks like this: 该表如下所示:

ID          BIGINT
TITLE       TEXT
DESCRIPTION TEXT
FILENAME    TEXT
TAGS        TEXT
ACTIVE      TINYINT

Now if the user inputs just blah blubber , the search must check wether every word appears in the fields TITLE , DESCRIOTION , FILENAME or TAGS . 现在,如果用户仅输入blah blubber ,则搜索必须检查是否每个单词都出现在TITLEDESCRIOTIONFILENAMETAGS字段中。 The result itself should be ordered by relevance, so how often does a string appear in the record. 结果本身应按相关性排序,因此字符串在记录中出现的频率。 I got this example data: 我得到了以下示例数据:

ID   | TITLE   | DESCRIPTION  | FILENAME | TAGS | ACTIVE
1    | blah    | blah         | bdsai    | bdha | 1
2    | blubber | blah         | blah     | adsb | 1
3    | blah    | dsabsadsab   | dnsa     | dsa  | 1

In this example, ID 2 must be at the top (2x blah, 1x blubber), then 1 (2x blah ) and then 3 (1x blah). 在此示例中,ID 2必须在顶部(2x blah,1x blubber),然后是1(2x blah),然后是3(1x blah)。 This process should be dynamical so the user can also input more words and the relevance works same as with one or several words. 这个过程应该是动态的,以便用户也可以输入更多的单词,并且相关性与一个或几个单词相同。

Is this possible to realize only in MySQL, or do I have to use some PHP? 这可能仅在MySQL中实现吗,还是我必须使用一些PHP? How would this work exactly? 这将如何工作?

Thank you very much for your help! 非常感谢您的帮助! Regards, Florian 问候,弗洛里安

EDIT: Here is the result after I tried the answer of Tom Mac: 编辑:这是我尝试汤姆·麦克的答案后的结果:

I have four records which look like this: 我有四个看起来像这样的记录:

ID  | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
1   | s     | s           | s        | s                             | 1
2   | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1
3   | 0     | s           | s        | s                             | 1
4   | a     | a           | a        | a                             | 1

Now, if I search for the string s , I should only get the top three records, ordered by a relevance of s. 现在,如果我搜索字符串s ,则应该只获得前三条记录,这些记录按s的相关性排序。 This means, the records should be orderer like this: 这意味着记录应该像这样排序:

ID | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
2  | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1        <== 8x s
1  | s     | s           | s        | s                             | 1        <== 4x s
3  | 0     | s           | s        | s                             | 1        <== 3x s

Now, I tried my query like this (the table's name is PAGES ): 现在,我这样尝试查询(表的名称为PAGES ):

select t . *
  from (

        select 
              match(title) against('*s*' in boolean mode) 
            + match(description) against('*s*' in boolean mode) 
            + match(filename) against('*s*' in boolean mode) 
            + match(tags) against('*s*' in boolean mode) 
            as matchrank,
                bb . *
          from pages bb) t
 where t.matchrank > 0
 order by t.matchrank desc

This query returns this: 该查询返回以下内容:

matchRank | ID  | TITLE | DESCRIPTION | FILENAME | TAGS                          | ACTIVE
2         | 2   | 0     | fdsadf      | sdfs     | a,b,c,d,e,f,s,a,a,s,s,as,sada | 1

Is this because of the wildcards? 这是因为通配符吗? I think, the string *s* should also find a value which is only s ... 我认为,字符串*s*也应该找到一个仅s的值。

This might help you out. 这可能会帮助您。 It does kinda assume that your MySQL table uses the MyISAM engine though: 确实有点假设您的MySQL表使用MyISAM引擎:

create table blubberBlah (id int unsigned not null primary key auto_increment,
title varchar(50) not null,
description varchar(50) not null,
filename varchar(50) not null,
tags varchar(50)not null,
active tinyint not null
) engine=MyISAM;

insert into blubberBlah (title,description,filename,tags,active) 
values ('blah','blah','bdsai','bdha',1);
insert into blubberBlah (title,description,filename,tags,active) 
values ('blubber','blah','blah','adsb',1);
insert into blubberBlah (title,description,filename,tags,active) 
values ('blah','dsabsadsab','dnsa','dsa',1);

select t.*
from
(
 select MATCH (title) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (description) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (fileName) AGAINST ('blubber blah' IN BOOLEAN MODE)
       +MATCH (tags) AGAINST ('blubber blah' IN BOOLEAN MODE) as matchRank,
       bb.*
from blubberBlah bb
) t
order by t.matchRank desc;

EDIT 编辑

Another assumption that this solution makes is that the string that your searching for is >= 4 characters long. 该解决方案的另一个假设是,您搜索的字符串的长度> = 4个字符。 If there is a possibility that the 'search for' string ie 'blubber' or 'blah' will be either 1, 2 or 3 characters long then you can always head to your my.cnf file and add ft_min_word_len=1 under the [mysqld] configuration options. 如果“搜索”字符串(即“ blubber”或“ blah”)的长度可能是1个,2个或3个字符,那么您始终可以转到my.cnf文件,并在[mysqld]下添加ft_min_word_len=1 [mysqld]配置选项。 Then restart MySQL and you should be good to go. 然后重新启动MySQL,您应该一切顺利。

One final thing: if you are considering using this approach then you should add a FULLTEXT INDEX to each of the columns. 最后一件事:如果您正在考虑使用这种方法,则应在每个列中添加一个FULLTEXT INDEX。 Hence: 因此:

ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx1`(`title`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx2`(`description`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx3`(`filename`);
ALTER TABLE blubberBlah add fulltext index `blubberBlahFtIdx4`(`tags`);

You can find more details on BOOLEAN FULLTEXT searching in the MySQL Docs . 您可以在MySQL文档中找到有关BOOLEAN FULLTEXT搜索的更多详细信息。

Rather than searching 'in boolean mode', use Match() Against() to determine a score. 与其搜索“以布尔模式”,不如使用Match()Against()来确定得分。 Add those scores up to get relevance. 将这些分数加起来以获得相关性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM