简体   繁体   English

复杂的mysql select语句

[英]complex mysql select statement

I'm trying to make mini search engine for a site containing products. 我正在尝试为包含产品的网站制作迷你搜索引擎。 I've already considered fulltext search, the LIKE clause, etc. but I still want to proceed my way because the database is going to be ridiculously huge (hundreds of millions of products). 我已经考虑过全文搜索,LIKE条款等等,但我仍然想继续我的方式,因为数据库将是非常庞大的(数以亿计的产品)。

The design goes something like this - I have a table pairing words to word IDs. 设计就是这样的 - 我有一个表格将单词与单词ID配对。 I have another table containing all pairs of word IDs to the product IDs for which the product matches. 我有另一个表,其中包含产品匹配的产品ID的所有字对ID。 When a user searches for, say, "2gb memory card", the script parses "2gb" "memory" and "card". 当用户搜索“2gb存储卡”时,脚本会解析“2gb”“内存”和“卡”。

Then I use: 然后我用:

SELECT pid 
  FROM indx_0 
 WHERE wid = 294 OR wid = 20591 OR wid = 330

I end up with pairs of words matching products. 我最终得到了一对匹配产品的单词。

I have a PHP algorithm to decide which products go to the top depending on multiple things. 我有一个PHP算法来决定哪些产品取决于多个东西。 but when i load 380k results into a php array the execution time becomes ridiculously slow. 但是当我将380k结果加载到php数组时,执行时间变得非常慢。 so clearly, i can't do that. 很明显,我做不到。 but if i limit to say, 1000 results per word, the execution is fast - but it doesn't include all the possible results. 但如果我限制说,每个单词1000个结果,执行速度很快 - 但它不包括所有可能的结果。

in the "indx_0" table each "pid" (product id) is unique to a "wid" (word id).. and clearly, some products are going to have more than 1 match. 在“indx_0”表中,每个“pid”(产品ID)对于“wid”(单词id)是唯一的......显然,某些产品将具有多于1个匹配。 i want to retrieve those "pid"s who have the most matches against "wid"s. 我想找回那些与“wid”最匹配的“pid”。

Say there are 2000 products matching "2gb" and 200,000 matching "card" and 50,000 matching "memory" but only 20 products that match ALL 3 of those words, and 200 products matching a combination of 2 of those words. 假设有2000个产品匹配“2gb”和200,000个匹配的“卡”和50,000个匹配的“记忆”,但只有20个产品匹配所有3个单词,200个产品匹配其中2个单词的组合。

Is it possible to retrieve those 20 products as well as the 200 products that partially match? 是否可以检索这20种产品以及部分匹配的200种产品?

What you probably need to do is group by the product ID and get a count that match. 您可能需要做的是按产品ID分组并获得匹配的计数。 Then have the order by the most counts hit descending... ie: one product matches all 3 wIDs and other just matches 1, the 3 count would be first in the list 然后让大多数计数命令下降...即:一个产品匹配所有3个wID而其他只匹配1,3个计数将在列表中排在第一位

SELECT pid, count(*) WordMatchCount
   FROM indx_0 
   WHERE pid in ( 294, 20591, 330 )
   group by pid
   order by WordMatchCount desc
   limit 1000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM