简体   繁体   English

MySQL(全文?)搜索产品

[英]Mysql (fulltext?) search for products

I am building a very simple product catalog that will store products inside mysql table and I want to search products as fast as possible (and as relevant as possible). 我正在构建一个非常简单的产品目录,该产品目录将产品存储在mysql表中,并且我想尽快(且尽可能相关)搜索产品。 The products database will be quite large (about 500.000 products) which is why searches using "like" that are not using indexes are very slow. 产品数据库将非常大(大约50万个产品),这就是为什么不使用索引的使用“顶”的搜索非常慢的原因。

I have many fields but the only ones I want to search are: 我有很多字段,但我要搜索的唯一字段是:

  • product_id = bigint product_id = bigint
  • title = varchar(255) 标题= varchar(255)
  • description = text 说明=文字

I tried experimenting with fulltext search but there were some problems: 我尝试尝试全文搜索,但是存在一些问题:

  • I couldn't search by product_id since it is big integer and can not be indexed inside fulltext index (sometimes user knows the ID of the product) 我无法按product_id进行搜索,因为它是大整数并且无法在全文索引内建立索引(有时用户知道产品的ID)
  • if I search for "Meter XY-123" I get no search results even though the single product's title as well as description contains both words ("meter" and "xy-123") 如果我搜索“ Meter XY-123”,即使单个产品的标题和说明都包含两个词(“ meter”和“ xy-123”),也不会获得搜索结果
  • I couldn't search for substrings - eg if product's title is "Foobar 123" it should be returned even if user searches for: 我无法搜索子字符串-例如,如果产品标题为“ Foobar 123”,则即使用户搜索,也应返回该子字符串:
    • foo bar 123 foo bar 123
    • bar 123 酒吧123
    • foobar 12 foob​​ar 12
    • foo
    • etc. 等等
  • results should also be returned ordered by some kind of relevance.. eg if I have two products "foobar 123" and "foobar 456" and user searches for "foobar 4" then both products should be returned (match any word) but second product should be ranked higher (because it contains also number 4) than the first one (that doesn't contain number 4). 还应按某种相关性排序返回结果。例如,如果我有两个产品“ foobar 123”和“ foobar 456”,并且用户搜索“ foobar 4”,则应同时返回两个产品(匹配任何单词),但第二个产品应该比第一个(不包含数字4)排名更高(因为它也包含数字4)。
  • products should also be ranked based on which field the value is found in. In this case product_id field has bigger weight than title which has also higher weight than description. 产品还应该根据在哪个字段中找到该值进行排序。在这种情况下,product_id字段的权重大于标题的权重,而标题的权重也大于说明的权重。 Eg if user searches for "1234" then: 例如,如果用户搜索“ 1234”,则:
    • first ranked product should be the one that has product_id 1234 排名第一的产品应该是具有product_id 1234的产品
    • afterwards there should be ranked products that include "1234" inside the title 之后,标题中应该包含“ 1234”的排名产品
    • afterwards products that include this number inside description 之后在说明中包含该编号的产品

What would be the best way to do searches on this table like this? 像这样在此表上进行搜索的最佳方法是什么? The only way that gives results that are good in my case is splitting the query string and querying multiple queries using "like" operator for every string and somehow calculate the weight but this solution works very slow (even more than 15 seconds for a single query which is too slow). 对于我而言,提供良好结果的唯一方法是拆分查询字符串,并使用“ like”运算符为每个字符串查询多个查询,并以某种方式计算权重,但是此解决方案的运行速度非常慢(单个查询甚至超过15秒)太慢了)。

I don't expect everything to be possible using only single query but I am looking for a solution that would be fast and as relevant as possible. 我不希望仅使用单个查询就能实现所有功能,但是我正在寻找一种快速且尽可能相关的解决方案。 If this means building some kind of custom word index or similar I am also willing to do this, I just need an idea how to manage this? 如果这意味着建立某种自定义单词索引或类似的内容,我也愿意这样做,那么我只需要一个如何管理它的想法?

thank you! 谢谢!

我们将搜索迁移到Sphinx。现在,我们需要微调结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM