简体   繁体   English

SQL查询以查找与一组类别匹配的产品

[英]SQL query to find products matching a set of categories

I have 3 tables: products, categories and pro_cat_link. 我有3个表:产品,类别和pro_cat_link。 A product can be linked to one or many categories through the table pro_cat_link. 可以通过表pro_cat_link将产品链接到一个或多个类别。

My query must answer the following problem: find all products that match a set of categories. 我的查询必须回答以下问题:查找与一组类别匹配的所有产品。 Ex: find all products that are "yellow AND fruit AND sweet". 例如:找到所有“黄色和水果和甜味”的产品。

When researching this problem in SO I could find only the solution what I'm currently using: Complicated SQL Query--finding items matching multiple different foreign keys 在SO中研究这个问题我只能找到我目前使用的解决方案: 复杂的SQL查询 - 查找匹配多个不同外键的项目

In my case, my query looks like this: 就我而言,我的查询如下所示:

SELECT products.id, COUNT(DISTINCT categories.id) as countCat
FROM products
INNER JOIN pro_cat_link ON (pro_cat_link.product_id = products.id)
WHERE pro_cat_link.category_id IN (3,6,8,10)
GROUP BY product.id
ORDER BY product.date DESC
HAVING countCat = 4

In other words, select all products that match one of category ids (3,6,8,10) and keep only those that have exactly 4 categories matching. 换句话说,选择与类别ID之一匹配的所有产品(3,6,8,10),并仅保留恰好有4个类别匹配的产品。

This works well, but I'm running into performance issues as the COUNT(), GROUP BY, ORDER BY makes proper indexing very limited. 这很好用,但我遇到了性能问题,因为COUNT(),GROUP BY,ORDER BY使得正确的索引非常有限。 Can anyone think of a better way to solve that problem? 谁能想到一个更好的方法来解决这个问题?

You could eliminate the performance problems of grouping and counting if you stored that information somewhere. 如果您将信息存储在某处,则可以消除分组和计数的性能问题。 You could add a column to Products called total_categories that will tell you how many categories the product participates in. Then you could just say where total_categories = 4 . 您可以向名为total_categories产品添加一列,该列将告诉您产品参与的类别数量。然后您可以说明where total_categories = 4 This might be more difficult to maintain if products are often changing their categories because you'd have to constantly update this field correctly - and then you have to decide if you want to do that in application code or in a trigger or in a stored procedure... 如果产品经常更改其类别,则可能更难以维护,因为您必须不断更新此字段 - 然后您必须决定是否要在应用程序代码或触发器或存储过程中执行此操作...

Normally I would not think it a very good idea to store such metadata directly in a table, but if the performance is really that bad, it might be worth considering. 通常我不认为将这些元数据直接存储在表中是一个非常好的主意,但如果性能真的那么糟糕,那么可能值得考虑。

If you don't have too many categories, instead of keeping track of a column count, you can have a bitstring that represents the categories it is in (ie, a 1 at position i means the product is in category i, and 0 means not in the category). 如果没有太多类别,而不是跟踪列数,则可以使用表示其所在类别的位串(即位置i处的1表示产品属于类别i,0表示不在类别中)。 Then, when searching for a group of categories, you generate a bitstring for that search, and AND all category strings with this string. 然后,一组类别的搜索时,您生成该搜索一个比特串,并AND这个字符串的所有类别的字符串。 The ones in the right category will produce the search string as the answer. 正确类别中的那些将生成搜索字符串作为答案。

For example, let's say you have ten categories. 例如,假设您有十个类别。 Item1 is in categories 1, 3, 5, 6, 8, 10 , so its category string is 1010110101 . ITEM1是在类别1, 3, 5, 6, 8, 10 ,所以它的类别的字符串是1010110101 Item2 is in categories 1, 2, 4, 6, 8, 10 , and so its category string is 1010101011 . 项目2是在类别1, 2, 4, 6, 8, 10 ,所以它的类别的字符串是1010101011 When searching for 3, 6, 8, and 10, you would generate the string s = 1010100100 . 搜索s = 1010100100和10时,您将生成字符串s = 1010100100 Item1 & s = 1010100100 = s . Item1 & s = 1010100100 = s Item2 & s = 1010100000 <> s . Item2 & s = 1010100000 <> s

Furthermore, you don't have to store it as a string, you could just store it as the actual base 10 equivalent. 此外,您不必将其存储为字符串,您可以将其存储为等效的实际基数10。 So Item1, Item2, and s are 693, 683, and 676 respectively. 所以Item1,Item2和s分别是693,683和676。 693 & 676 = 676 , but 683 & 676 = 672 . 693 & 676 = 676 ,但683 & 676 = 672 Then, if you're adding a product to category i, just update it's category number by 2^(i - 1), and if you're removing from category i, just subtract 2^(i - 1). 然后,如果您要将产品添加到类别i,只需将其类别编号更新为2 ^(i - 1),如果您要从类别i中删除,则只需减去2 ^(i - 1)。

Of course, if you have more categories than bits in a MySQL int, this won't work at all. 当然,如果你在MySQL int中有多个类别而不是位,那么这根本不起作用。 Also, as FrustratedWithFormsDes points out in his answer, this then invokes all the problems of updating both pro_cat_link and this table (course, depending on what pro_cat_link is used for, this might eliminate it completely). 另外,正如FrustratedWithFormsDes在他的回答中指出的那样,这会调用更新pro_cat_link和这个表的所有问题(当然,这取决于pro_cat_link用于什么,这可能完全消除它)。 Furthermore, if a category changes numbers, you have to update everything. 此外,如果类别更改了数字,则必须更新所有内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM