简体   繁体   中英

mysql one-to-many query with negation and/or multiple criteria

I thought a query like this would be pretty easy because of the nature of relational databases but it seems to be giving me a fit. I also searched around but found nothing that really helped. Here's the situation:

Let's say I have a simple relationship for products and product tags. This is a one-to-many relationship, so we could have the following:

productid  |  tag
========================
1          |  Car
1          |  Black
1          |  Ford
2          |  Car
2          |  Red
2          |  Ford
3          |  Car
3          |  Black
3          |  Lexus
4          |  Motorcycle
4          |  Black
5          |  Skateboard
5          |  Black
6          |  Skateboard
6          |  Green

What's the most efficient way to query for all (Ford OR Black OR Skateboard) AND NOT (Motorcycles OR Green) ? Another query I'm going to need to do is something like all (Car) or (Skateboard) or (Green AND Motorcycle) or (Red AND Motorcycle) .

There are about 150k records in the products table and 600k records in the tags tables, so the query is going to need to be as efficient as possible. Here's one query that I've been messing around with (example #1), but it seems to be taking about 4 seconds or so. Any help would be much appreciated.

SELECT p.productid
FROM   products p
       JOIN producttags tag1 USING (productid)
WHERE  p.active = 1
       AND tag1.tag IN ( 'Ford', 'Black', 'Skatebaord' )
       AND p.productid NOT IN (SELECT productid
                               FROM   producttags
                               WHERE  tag IN ( 'Motorcycle', 'Green' ));

Update

The quickest query I've found so far is something like this. It's taking 100-200ms but it seems pretty inflexible and ugly. Basically I'm grabbing all products that match Ford , Black , or Skateboard . Them I'm concatenating all of the tags for those matched products into a colon-separated string and removing all products that match on :Green: AND :Motorcycle: . Any thoughts?

SELECT p.productid,
       Concat(':', Group_concat(alltags.tag SEPARATOR ':'), ':') AS taglist
FROM   products p
       JOIN producttags tag1 USING (productid)
       JOIN producttags alltags USING (productid)
WHERE  p.active = 1
       AND tag1.tag IN ( 'Ford', 'Black', 'Skateboard' )
GROUP  BY tag1.productid
HAVING ( taglist NOT LIKE '%:Motorcycle:%'
         AND taglist NOT LIKE '%:Green:%' ); 

I'd write the exclusion join with no subqueries:

SELECT p.productid
FROM   products p
INNER JOIN producttags AS t ON p.productid = t.productid
LEFT OUTER JOIN producttags AS x ON p.productid = x.productid 
       AND x.tag IN ('Motorcycle', 'Green')
WHERE  p.active = 1
       AND t.tag IN ( 'Ford', 'Black', 'Skateboard' )
       AND x.productid IS NULL;

Make sure you have an index on products over the two columns (active, productid) in that order.

You should also have an index on producttags over the two columns (productid, tag) in that order.

Another query I'm going to need to do is something like all (Car) or (Skateboard) or (Green AND Motorcycle) or (Red AND Motorcycle).

Sometimes these complex conditions are hard for the MySQL optimizer. One common workaround is to use UNION to combine simpler queries:

SELECT p.productid
FROM   products p
INNER JOIN producttags AS t1 ON p.productid = t1.productid
WHERE  p.active = 1
   AND t1.tag IN ('Car', 'Skateboard')

UNION ALL

SELECT p.productid
FROM   products p
INNER JOIN producttags AS t1 ON p.productid = t1.productid
INNER JOIN producttags AS t2 ON p.productid = t2.productid 
WHERE  p.active = 1
   AND t1.tag IN ('Motorcycle')
   AND t2.tag IN ('Green', 'Red');

PS: Your tagging table is not an Entity-Attribute-Value table.

I would get all the unique ID matches and the unique IDs to filter out, then LEFT JOIN those lists (as per tigeryan) and filter out any IDs that match. The query should also be easier to read and modify by keeping all the queries separate. It should be fairly quick also, although it may not look like it.

SELECT * FROM products p
WHERE 
p.active=1 AND
productid IN (
SELECT matches.productid FROM (
  SELECT DISTINCT productid FROM producttags 
  WHERE tag IN ('Ford','Green','Skatebaord')
) AS matches
LEFT JOIN (
  SELECT DISTINCT productid FROM producttags 
  WHERE tag IN ('Motorcycles','Green')
) AS filter ON filter.productid=matches.productid
WHERE filter.productid IS NULL
)

Sometimes a JOIN is faster than an IN, depending on how mysql optimizes the query:

SELECT p.* FROM (
SELECT matches.productid FROM (
  SELECT DISTINCT productid FROM producttags 
  WHERE tag IN ('Ford','Green','Skatebaord')
) AS matches
LEFT JOIN (
  SELECT DISTINCT productid FROM producttags 
  WHERE tag IN ('Motorcycles','Green')
) AS filter ON filter.productid=matches.productid
WHERE filter.productid IS NULL
) AS idfilter
    JOIN products p ON p.productid=idfilter.productid AND p.active=1

The second query should force the join order since the internal selects have to be done first.

I would usually attack this by trying to eliminate records in the from...

select p.productid 
from product p 
left join producttags tag1 
    on p.productid = tag1.productid and tag1.tag NOT IN ('Motorcycles','Green')
where tag1.tag IN ('Ford','Black','Skateboard') and p.active = 1

What about this one:

SELECT DISTINCT p.id FROM products AS p
JOIN producttags AS included ON (
    included.productid = p.id
    AND included.tag IN ('Ford', 'Black', 'Skatebaord') 
)
WHERE active = 1
AND p.id NOT IN (
    SELECT DISTINCT productid FROM producttags
    WHERE tag IN ('Motorcycle', 'Green')
)

Alternative to the CONCAT/LIKE solution:

SELECT p.productid
FROM products p
JOIN producttags USING (productid)
WHERE p.active = 1
AND tag IN ('Ford', 'Black', 'Skateboard')
GROUP BY p.productid
HAVING SUM(IF(tag IN ('Motorcycle','Green'), 1, 0)) = 0;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM