简体   繁体   中英

What is the best way to implement this SQL query?

I have a PRODUCTS table, and each product can have multiple attributes so I have an ATTRIBUTES table, and another table called ATTRIBPRODUCTS which sits in the middle. The attributes are grouped into classes (type, brand, material, colour, etc), so people might want a product of a particular type, from a certain brand.

PRODUCTS
product_id
product_name

ATTRIBUTES
attribute_id
attribute_name
attribute_class

ATTRIBPRODUCTS
attribute_id
product_id

When someone is looking for a product they can select one or many of the attributes. The problem I'm having is returning a single product that has multiple attributes. This should be really simple I know but SQL really isn't my thing and past a certain point I get a bit lost in the logic. The problem is I'm trying to check each attribute class separately so I want to end up with something like:

SELECT DISTINCT products.product_id
FROM         attribproducts 
INNER JOIN products ON attribproducts.product_id = products.product_id
WHERE     (attribproducts.attribute_id IN (9,10,11)
AND        attribproducts.attribute_id IN (60,61))

I've used IN to separate the blocks of attributes of different classes, so I end up with the products which are of certain types, but also of certain brands. From the results I've had it seems to be that AND between the IN statements that's causing the problem.

Can anyone help a little? I don't have the luxury of completely refactoring the database unfortunately, there is a lot more to it than this bit, so any suggestions how to work with what I have will be gratefully received.

Take a look at the answers to the question SQL: Many-To-Many table AND query . It's the exact same problem. Cletus gave there 2 possible solutions, none of which very trivial (but then again, there simply is no trivial solution).

SELECT DISTINCT products.product_id 
FROM products p
INNER JOIN attribproducts ptype on p.product_id = ptype.product_id
INNER JOIN attribproducts pbrand on p.product_id = pbrand.product_id 
WHERE ptype.attribute_id IN (9,10,11) 
    AND pbrand.attribute_id IN (60,61)

Try this:

select * from products p, attribproducts a1, attribproducts a2
  where p.product_id = a1.product_id
    and p.product_id = a2.product_id
    and a1.attribute_id in (9,10,11)
    and a2.attribute_id in (60,61);

This will return no rows because you're only counting rows that have a number that's (either 9, 10, 11) AND (either 60, 61).

Because those sets don't intersect, you'll get no rows.

If you use OR instead, it'll give products with attributes that are in the set 9, 10, 11, 60, 61, which isn't what you want either, although you'll then get multiple rows for each product.

You could use that select as an subquery in a GROUP BY statement, grouping by the quantity of products, and order that grouping by the number of shared attributes. That will give you the highest matches first.

Alternatively (as another answer shows), you could join with a new copy of the table for each attribute set, giving you only those products that match all attribute sets.

It sounds like you have a data schema that is GREAT for storage but terrible for selecting/reporting. When you have a data structure of OBJECT, ATTRIBUTE, OBJECT-ATTRIBUTE and OBJECT-ATTRIBUTE-VALUE you can store many objects with many different attributes per object. This is sometime referred to as "Vertical Storage".

However, when you want to retrieve a list of objects with all of their attributes values, it is an variable number of joins you have to make. It is much easier to retrieve data when it is stored horizonatally (Defined columns of data)

I have run into this scenario several times. Since you cannot change the existing data structure. My suggest would be to write a "layer" of tables on top. Dynamically create a table for each object/product you have. Then dynamically create static columns in those new tables for each attribute. Pretty much you need to "flatten" your vertically stored attribute/values into static columns. Convert from a vertical architecture into a horizontal ones.

Use the "flattened" tables for reporting, and use the vertical tables for storage.

If you need sample code or more details, just ask me.

I hope this is clear. I have not had much coffee yet :)

Thanks, - Mark

You can use multiple inner joins -- I think this would work:

select distinct product_id
from products p
inner join attribproducts a1 on a1.product_id=p.product_id
inner join attribproducts a2 on a1.product_id=p.product_id
where a1.attribute_id in (9,10,11) 
  and a2.attribute_id in (60,61)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM