简体   繁体   中英

SQL query to find products matching a set of categories

I have 3 tables: products, categories and pro_cat_link. A product can be linked to one or many categories through the table pro_cat_link.

My query must answer the following problem: find all products that match a set of categories. Ex: find all products that are "yellow AND fruit AND sweet".

When researching this problem in SO I could find only the solution what I'm currently using: Complicated SQL Query--finding items matching multiple different foreign keys

In my case, my query looks like this:

SELECT products.id, COUNT(DISTINCT categories.id) as countCat
FROM products
INNER JOIN pro_cat_link ON (pro_cat_link.product_id = products.id)
WHERE pro_cat_link.category_id IN (3,6,8,10)
GROUP BY product.id
ORDER BY product.date DESC
HAVING countCat = 4

In other words, select all products that match one of category ids (3,6,8,10) and keep only those that have exactly 4 categories matching.

This works well, but I'm running into performance issues as the COUNT(), GROUP BY, ORDER BY makes proper indexing very limited. Can anyone think of a better way to solve that problem?

You could eliminate the performance problems of grouping and counting if you stored that information somewhere. You could add a column to Products called total_categories that will tell you how many categories the product participates in. Then you could just say where total_categories = 4 . This might be more difficult to maintain if products are often changing their categories because you'd have to constantly update this field correctly - and then you have to decide if you want to do that in application code or in a trigger or in a stored procedure...

Normally I would not think it a very good idea to store such metadata directly in a table, but if the performance is really that bad, it might be worth considering.

If you don't have too many categories, instead of keeping track of a column count, you can have a bitstring that represents the categories it is in (ie, a 1 at position i means the product is in category i, and 0 means not in the category). Then, when searching for a group of categories, you generate a bitstring for that search, and AND all category strings with this string. The ones in the right category will produce the search string as the answer.

For example, let's say you have ten categories. Item1 is in categories 1, 3, 5, 6, 8, 10 , so its category string is 1010110101 . Item2 is in categories 1, 2, 4, 6, 8, 10 , and so its category string is 1010101011 . When searching for 3, 6, 8, and 10, you would generate the string s = 1010100100 . Item1 & s = 1010100100 = s . Item2 & s = 1010100000 <> s .

Furthermore, you don't have to store it as a string, you could just store it as the actual base 10 equivalent. So Item1, Item2, and s are 693, 683, and 676 respectively. 693 & 676 = 676 , but 683 & 676 = 672 . Then, if you're adding a product to category i, just update it's category number by 2^(i - 1), and if you're removing from category i, just subtract 2^(i - 1).

Of course, if you have more categories than bits in a MySQL int, this won't work at all. Also, as FrustratedWithFormsDes points out in his answer, this then invokes all the problems of updating both pro_cat_link and this table (course, depending on what pro_cat_link is used for, this might eliminate it completely). Furthermore, if a category changes numbers, you have to update everything.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM