通过要求所有许多满足条件来过滤一对多查询

Question

Imagine the following tables: 想象一下下表：

create table boxes( id int, name text, ...); 创建表格框（id int，name text，...）;

create table thingsinboxes( id int, box_id int, thing enum('apple,'banana','orange'); 创建表thinginboxes（id int，box_id int，thing enum（'apple，'banana'，'orange'）;

And the tables look like: 表格如下：

Boxes:
id | name
1  | orangesOnly
2  | orangesOnly2
3  | orangesBananas
4  | misc

thingsinboxes:
id | box_id | thing
1  |  1     | orange
2  |  1     | orange
3  |  2     | orange
4  |  3     | orange
5  |  3     | banana
6  |  4     | orange
7  |  4     | apple
8  |  4     | banana

How do I select the boxes that contain at least one orange and nothing that isn't an orange? 如何选择包含至少一个橙色的盒子，而不包含任何不是橙色的盒子？

How does this scale, assuming I have several hundred thousand boxes and possibly a million things in boxes? 这个规模如何，假设我有数十万个盒子，可能有一百万个盒子？

I'd like to keep this all in SQL if possible, rather than post-processing the result set with a script. 如果可能的话，我想将这一切保留在SQL中，而不是使用脚本对结果集进行后处理。

I'm using both postgres and mysql, so subqueries are probably bad, given that mysql doesn't optimize subqueries (pre version 6, anyway). 我正在使用postgres和mysql，因此子查询可能很糟糕，因为mysql没有优化子查询（无论如何都是版本6）。

Answer 1

SELECT b.*
FROM boxes b JOIN thingsinboxes t ON (b.id = t.box_id)
GROUP BY b.id
HAVING COUNT(DISTINCT t.thing) = 1 AND SUM(t.thing = 'orange') > 0;

Here's another solution that does not use GROUP BY: 这是另一个不使用GROUP BY的解决方案：

SELECT DISTINCT b.*
FROM boxes b
  JOIN thingsinboxes t1 
    ON (b.id = t1.box_id AND t1.thing = 'orange')
  LEFT OUTER JOIN thingsinboxes t2 
    ON (b.id = t2.box_id AND t2.thing != 'orange')
WHERE t2.box_id IS NULL;

As always, before you make conclusions about the scalability or performance of a query, you have to try it with a realistic data set, and measure the performance. 与往常一样，在您对查询的可伸缩性或性能做出结论之前， 您必须使用实际数据集进行尝试 ，并测量性能。

Answer 2

I think Bill Karwin's query is just fine, however if a relatively small proportion of boxes contain oranges, you should be able to speed things up by using an index on the thing field: 我认为Bill Karwin的查询很好，但是如果相对较小比例的盒子包含橙子，你应该能够通过在thing字段上使用索引来加快速度：

SELECT b.*
FROM boxes b JOIN thingsinboxes t1 ON (b.id = t1.box_id)
WHERE t1.thing = 'orange'
AND NOT EXISTS (
    SELECT 1
    FROM thingsinboxes t2
    WHERE t2.box_id = b.id
    AND t2.thing <> 'orange'
)
GROUP BY t1.box_id

The WHERE NOT EXISTS subquery will only be run once per orange thing, so it's not too expensive provided there aren't many oranges. WHERE NOT EXISTS子查询只会在每个橙色的东西上运行一次，所以如果橙子不多，它就不会太贵。

通过要求所有许多满足条件来过滤一对多查询

问题描述

2 个解决方案

解决方案1
5 已采纳 2009-01-26 22:19:54

解决方案2
2 2009-01-26 22:45:05

通过要求所有许多满足条件来过滤一对多查询

问题描述

2 个解决方案

解决方案1 5 已采纳 2009-01-26 22:19:54

解决方案2 2 2009-01-26 22:45:05

解决方案1
5 已采纳 2009-01-26 22:19:54

解决方案2
2 2009-01-26 22:45:05