简体   繁体   English

如何在全文搜索中处理多种搜索条件和优先级

[英]How to handle multiple search conditions and priorities in Full Text Search

Is it possible to reduce executed querys in any way ? 是否有可能以任何方式减少执行的查询? because the way i do that for now is OK but later i can end up with 30 querys and this do not look OK to me 因为我现在这样做的方式还可以,但是以后我可能会遇到30个查询,对我来说这看起来还不行

my script 我的剧本

$string = 'new movie stars';
$words =  preg_split('/(\/|\s+)/', $string);
print_r($words);

Array ( [0] => new [1] => movie [2] => stars ) 数组([0] =>新[1] =>电影[2] =>星)

$sql = "SELECT * FROM movie WHERE MATCH(name) AGAINST('+$words[0] +$words[1] +$words[2]' IN BOOLEAN MODE)";
$query_name = $this->db->query($sql);

if ($query_name->num_rows < 20) {
$sql = "SELECT * FROM movie WHERE MATCH(name) AGAINST('+$words[0] +($words[1] $words[2])' IN BOOLEAN MODE)";
$query_name_two = $this->db->query($sql);
}

if (count($query_name->num_rows + $query_name_two->num_rows) < 20) {
$sql = "SELECT * FROM movie WHERE MATCH(name) AGAINST('$words[0] $words[1] $words[2]' IN BOOLEAN MODE)";
$query_name_three = $this->db->query($sql);
}

Your code is open to SQL injection related attacks. 您的代码对SQL注入相关的攻击开放。 Even real_escape_string cannot secure it completely. 甚至real_escape_string也无法完全保护它。 Please learn to use Prepared Statements instead. 请学习改为使用预备语句

Now, besides the above suggestion, there are two further fixes possible: 现在,除了上述建议外,还有两个可能的修复方法:

Fix #1 The php code that you are using to tokenize the input string into words for FTS is insufficient. 修复#1用于将输入字符串标记为单词的FTS的php代码不足。 Some time back, I did create a function to handle this requirement in more robust manner. 不久以前,我确实创建了一个函数来以更可靠的方式处理此需求。 You may use the following instead: 可以改用以下内容:

/**
 * Method to take an input string and tokenize it into an array of words for Full Text Searching (FTS).
 * This method is used when an input string can be made up of multiple words (let's say, separated by space characters),
 * and we need to use different Boolean operators on each of the words. The tokenizing process is similar to extraction
 * of words by FTS parser in MySQL. The operators used for matching in Boolean condition are removed from the input $phrase.
 * These characters as of latest version of MySQL (8+) are: +-><()~*:""&|
 * We can also execute the following query to get updated list: show variables like 'ft_boolean_syntax';
 * Afterwards, the modified string is split into individual words considering either space, comma, and, period (.) characters.
 * Details at: https://dev.mysql.com/doc/refman/8.0/en/fulltext-natural-language.html
 * @param string $phrase Input statement/phrase consisting of words
 * @return array Tokenized words
 * @author Madhur, 2019
 */
function tokenizeStringIntoFTSWords(string $phrase) : array {
    $phrase_mod = trim(preg_replace('/[><()~*:"&|+-]/', '', trim($phrase)));
    return preg_split('/[\s,.]/', $phrase_mod, null, PREG_SPLIT_NO_EMPTY);
}

Fix #2 It seems that you are trying to rank the searches, by giving priority in following order: Fix#2似乎您正在尝试通过按以下顺序给予优先级来对搜索排名:

All words in the text > First word AND Any of the remaining two words > Atleast any of the three words. 文本中的所有单词>第一个单词AND其余两个单词中的任何一个>忽略三个单词中的任何一个。

But, if you read the Full Text Search Documentation , you can do the sorting by relevance using MATCH() , as it also returns the Relevance score. 但是,如果您阅读了全文搜索文档 ,则可以使用MATCH()按相关性进行排序,因为它还会返回相关性得分。

When MATCH() is used in a WHERE clause, the rows returned are automatically sorted with the highest relevance first ( Unfortunately, this works only in NATURAL mode, not BOOLEAN mode ). WHERE子句中使用MATCH() ,返回的行将自动按照相关性最高的顺序进行排序( 不幸的是,这仅在NATURAL模式下有效,而在BOOLEAN模式下无效 )。 Relevance values are nonnegative floating-point numbers. 相关性值是非负浮点数。 Zero relevance means no similarity. 零相关性意味着没有相似性。 Relevance is computed based on the number of words in the row (document), the number of unique words in the row, the total number of words in the collection, and the number of rows that contain a particular word. 相关性是根据行(文档)中单词的数量,行中唯一单词的数量,集合中单词的总数以及包含特定单词的行数量来计算的。

So basically, All words in the text has already higher relevance than Atleast any of the three words . 因此,基本上, 文本中的所有单词的相关性已经比三个单词中的任何一个都高。 Now, if you need to give higher priority to the first word, you simply need to use > operator on the first word. 现在,如果您需要赋予第一个单词更高的优先级,只需在第一个单词上使用>运算符即可。 So, all you need is just the following single query: 因此,您只需要执行以下单个查询:

SELECT * FROM movie 
WHERE 
  MATCH(name) 
  AGAINST('>:first_word :second_word :third_word ..and so on)' IN BOOLEAN MODE)
ORDER BY 
  MATCH(name) 
  AGAINST('>:first_word :second_word :third_word ..and so on)' IN BOOLEAN MODE) 
  DESC
LIMIT 20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM