简体   繁体   中英

Sql: choose all baskets containing a set of particular items

Eddy has baskets with items. Each item can belong to arbitrary number of baskets or can belong to none of them.

Sql schema to represent it is as following:

tbl_basket
- basketId

tbl_item
- itemId

tbl_basket_item
- pkId
- basketId
- itemId

Question: how to select all baskets containing a particular set of items?

UPDATE. Baskets with all the items are needed. Otherwise it would have been easy task to solve.

UPDATE B. Have implemented following solution, including SQL generation in PHP:

SELECT basketId
FROM   tbl_basket
JOIN   (SELECT basketId FROM tbl_basket_item WHERE itemId = 1  ) AS t0 USING(basketId)
JOIN   (SELECT basketId FROM tbl_basket_item WHERE itemId = 15 ) AS t1 USING(basketId)
JOIN   (SELECT basketId FROM tbl_basket_item WHERE itemId = 488) AS t2 USING(basketId)

where number of JOINs equals to number of items.

That works good unless some of the items are included in almost every basket. Then performance drops dramatically.

UPDATE B+. To resolve performance issues heuristic is applied. First you select frequency of each item. If it exceeds some threshold, you don't include it in JOINs and either:

  • apply post-filtering in PHP
  • or just don't apply filter by particular itemId, giving a user approximate results in a resonable amount of time

UPDATE B++. Seems that current problem have no nice solution in MySQL. This point raises one question and one solution:

  • (question) Does PostgreSQL have some advanced indexing techniques which allows to solve this problem without doing a full scan?
  • (solution) Seems that it could be solved nicely in Redis using sets and SINTER command to get an intersection.

I think the best way is to create a temporary table with the set of needed items (procedure that takes the item ids as parameters or something along those lines) and then left join it with all of the above tables joined together.

If for a given basketid you have NO nulls on the right side of the left join, the basket contains all the needed items.

-- the table definitions
CREATE TABLE basket ( basketid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE item ( itemid INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE basket_item
        ( basketid INTEGER NOT NULL REFERENCES basket (basketid)
        , itemid INTEGER NOT NULL REFERENCES item (itemid)
        , PRIMARY KEY (basketid, itemid)
        );

-- the query
SELECT * FROM basket b
WHERE NOT EXISTS (
        SELECT * FROM item i
        WHERE i.itemid IN (1,15,488)
        AND NOT EXISTS (
                SELECT * FROM basket_item bi
                WHERE bi.basketid = b.basketid
                AND bi.itemid = i.itemid
                )
        );

If you are going to provide the list of items, then edit id1, id2, etc. in below query:

select distinct t.basketId
from tbl_basket_item as t
where t.itemID in (id1, id2)

will give all baskets containing a set of items. No need to join any other tables as your requirements don't need them.

The simplest solution is to use HAVING clause.

SELECT basketId
FROM   tbl_basket
WHERE itemId IN (1,15,488)
HAVING Count(DISTINCT itemId) = 3 --DISTINCT in case we have duplicate items in a basket
GROUP BY basketId

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM