简体   繁体   English

MySQL的FULLTEXT搜索是否为MyISAM和InnoDb返回相同的结果?

[英]Does MySQL's FULLTEXT search return the same results for MyISAM and InnoDb?

If you take a table and simply change the storage engine engine from MyISAM to InnoDb will all WHERE MATCH (col1,col2,col3...) AGAINST (expr) return exactly the same results as under MyISAM? 如果您拿一张桌子,简单地将存储引擎从MyISAM更改为InnoDb,所有WHERE MATCH (col1,col2,col3...) AGAINST (expr)再次WHERE MATCH (col1,col2,col3...) AGAINST (expr)返回与MyISAM完全相同的结果吗? If no, what are the differences? 如果没有,有什么区别?

I mean just differences in terms of fulltext searches, nothing else. 我的意思在全文搜索方面仅存在差异,仅此而已。 There are obviously other huge differences in these two storage engines. 这两个存储引擎显然还有其他巨大差异。

There are actually some notable differences in the implementation of the MyISAM and InnoDB fulltext searches: 实际上,在MyISAM和InnoDB全文搜索的实现中确实存在一些显着差异:

  • The MyISAM natural language search (but not the boolean mode) has a 50% threshold, while InnoDB doesn't, so very (very) common words are excluded in the MyISAM results. MyISAM自然语言搜索(但不是布尔模式)具有50%的阈值,而InnoDB没有,因此,非常(非常)常见的单词被排除在MyISAM结果中。 There is a remark in the manual about it: 手册中有关于此说明:

    The 50% threshold can surprise you when you first try full-text searching to see how it works, and makes InnoDB tables more suited to experimentation with full-text searches. 首次尝试全文搜索以了解其工作原理时,50%的阈值可能会让您感到惊讶,并使InnoDB表更适合进行全文搜索的实验。 If you create a MyISAM table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. 如果创建MyISAM表并仅在其中插入一两行文本,则文本中的每个单词至少出现在50%的行中。 As a result, no search returns any results until the table contains more rows. 结果,在表包含更多行之前,没有搜索返回任何结果。

  • The MyISAM stopword list (a list of words that is not included in the fulltext index and thus cannot be found) is significantly longer than the (default) one used by InnoDB, so eg "everybody" or "unfortunately" can be found with InnoDB, but not with MyISAM. MyISAM 停用词列表(全文索引中不包含的单词列表,因此无法找到)比InnoDB使用的(默认)单词长得多,因此在InnoDB中可以找到“所有人”或“不幸的” ,但不适用于MyISAM。 match against ('Mary Had a Little Lamb') will usually contain a lot more results, as "had" is a stopword in MyISAM, but not in InnoDB. match against ('Mary Had a Little Lamb')通常会包含更多结果,因为“ had”是MyISAM中的停用词,而在InnoDB中则不是。

  • MyISAM and InnoDB use different weight algorithms. MyISAM和InnoDB使用不同的权重算法。 MyISAM considers eg the ratio of matching words to non-matching words in a row, so a long sentence that contains a word is less relevant than a short sentence with that word. MyISAM考虑例如连续匹配单词与不匹配单词的比率,因此包含单词的长句子比包含该单词的短句子的相关性小。 Although this will only change the order in the otherwise identical resultset, this oftentimes has a significant impact on the user experience and if the user regards two results as "the same" , which is what you are asking about. 尽管这只会改变其他结果集相同的顺序,但是这通常会对用户体验产生重大影响,并且如果用户将两个结果视为“相同” ,这就是您要问的问题。 This might also be particular relevant, as searches usually include a limit, eg order by score desc limit 10 , which thus can yield completely different results. 这也可能特别相关,因为搜索通常包括一个限制,例如order by score desc limit 10 ,因此可以产生完全不同的结果。

  • InnoDB supports "" to match exact phrases (words in given order), while MyISAM (at least in natural language mode) doesn't. InnoDB支持""来匹配确切的短语(给定顺序的单词),而MyISAM(至少在自然语言模式下)不匹配。 So if you use match against ('"Mary Had a Little Lamb"') , InnoDB will only return a row if it contains this exact sentence, while MyISAM will find every row that contains any of these words (apart from "had" as mentioned above, and "a", which is in both stopword lists). 因此,如果使用match against ('"Mary Had a Little Lamb"') ,则InnoDB仅在包含该确切句子的情况下才返回一行,而MyISAM将查找包含这些单词中的任何一个的每一行(除了” had“ as以及两个停用词列表中的“ a”)。

  • Since you are using the natural language mode, deviations in the boolean search are probably not relevant for you, but to list at least one: the two engines differ in how they treat stop (or short) words in the search query. 由于您使用的是自然语言模式,因此布尔搜索中的差异可能与您无关,但至少要列出一个:这两个引擎在搜索查询中的停止(或短)词处理方式方面有所不同。 If you use match against ('+about +Mary' in boolean mode) ("about" is a stopword in both engines), InnoDB will try to find that word in the index although it cannot be in there, and thus return no results, while MyISAM will ignore that word and can return results that may not contain "about", only "Mary". 如果match against ('+about +Mary' in boolean mode)使用match against ('+about +Mary' in boolean mode) (两个引擎中的“ about”都是停用词),则InnoDB会尝试在索引中找到该词,尽管该词不能在那里,因此不返回任何结果,而MyISAM将忽略该单词,并可以返回可能不包含“ about”(仅“ Mary”)的结果。

Additionally, the default values for the minimum word length, ft_min_word_len for MyISAM (default 4) and innodb_ft_min_token_size for InnoDB (default 3) are different, so if you do not adjust them, the InnoDB index will contain (and find) more words. 此外,最小字长的默认值(MyISAM的ft_min_word_len (默认值为4)和InnoDB的innodb_ft_min_token_size (默认值为3)是不同的,因此,如果不进行调整,InnoDB索引将包含(并找到)更多的单词。 You might also want to adept the stop word list to match each other. 您可能还想使停用词列表相互匹配。

If these differences are relevant in your case will depend on your data, your search patterns and if you consider a different order to be a different result. 如果这些差异与您相关,则取决于您的数据,搜索方式以及您认为不同的顺序是不同的结果。 Searches in data that mainly consists of short terms or fixed formats, eg product codes or company names, or searches where you are mainly interested in finding specific words at all, or searches that usually only yield a handful of possible results, will usually vary less in the two engines than searches in actual english texts, where a different relevance score has a bigger effect. 在主要由短期或固定格式(例如产品代码或公司名称)组成的数据中进行搜索,或者对您根本只想查找特定单词感兴趣的搜索,或者通常仅产生少量可能结果的搜索,通常变化较小与在实际英文文本中进行搜索相比,在这两个引擎中的搜索结果要高得多。

No, there's no guarantee that an InnoDB fulltext index works exactly the same way as the MyISAM fulltext index on the same data. 不可以,不能保证InnoDB全文索引与MyISAM全文索引在相同数据上的工作方式完全相同。

Last time I tested it (which was when InnoDB FT was still Beta), there were definitely cases where InnoDB FT did not return some rows that matched in MyISAM FT. 上次我测试它(当时InnoDB FT仍为Beta)时,肯定有情况下InnoDB FT没有返回与MyISAM FT匹配的某些行。 It also returned some rows that were not matched in MyISAM. 它还返回了MyISAM中匹配的一些行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM