简体   繁体   中英

PostgreSQL simple query optimization

PostgreSQL 8.4; Three tables - store (~100k, pk id, fk supplier_id & item_id), supplier(~10 pk supplier_id), item(~1000 pk item_id);

I created the following query to get the data I need:

SELECT store.quantity, store.price, x.supplier_name
FROM store NATURAL JOIN
     (SELECT * FROM item NATURAL JOIN supplier) AS x 
 WHERE store.price > 500 AND store.quantity > 0 AND
       store.quantity < 100 AND
       x.item_name = 'SomeName';

The query plan:

Nested Loop  (cost=20.76..6513.55 rows=8 width=229)
  ->  Hash Join  (cost=20.76..6511.30 rows=8 width=15)
        Hash Cond: (store.item_id = item.item_id)
        ->  Seq Scan on store  (cost=0.00..6459.00 rows=8388 width=23)
              Filter: ((price > 500::numeric) AND (quantity > 0) AND (quantity < 100))
        ->  Hash  (cost=20.75..20.75 rows=1 width=8)
              ->  Seq Scan on item  (cost=0.00..20.75 rows=1 width=8)
                    Filter: ((item_name)::text = 'SomeName'::text)
  ->  Index Scan using supplier_pkey on supplier  (cost=0.00..0.27 rows=1 width=222)
        Index Cond: (supplier.supplier_id = store.supplier_id)

Now the aim is to reduce the cost by more than 30% by optimizing the query itself. The only instances of this problem I found were solved by modifying the table or the server settings, but I am looking to do this by modifying nothing else than the query and that's where I fell short in research.

Clearly the issue to be solved is the Seq Scan, which brings me to thinking I need to arrange it so that the scanning/filtering is applied only to a subset of the store table - but iirc you need to scan the table in any such case, so maybe use something else than a Seq Scan? Index scan isn't going to help since I wouldn't be filtering by the index... I'm puzzled here because this seems more of a choice that the PostgreSQL optimizer makes and not something I can change at will...

(If you're wondering, this was part of an assignment and I'm asking here because I have spent quite a few hours researching the problem failing to find anything relevant, and I just gave up on it, but I'm still curious...)

You can probably fix this with indexes. It is a little hard to tell what the keys are because of the "natural join"s. (I recommend using instead of natural join so you can at least see what keys are being used and if one of the tables is modified, it won't mess up the join.)

I think an index on item(item_name, item_id) would help the query plan.

Will be hard to optimize because it looks nice, try this to avoid subquery :

SELECT 
    store.quantity, 
    store.price, 
    supplier.supplier_name 
FROM store 
    INNER JOIN item
        ON store.item_id = item.item_id
    INNER JOIN supplier
        ON supplier.supplier_id = store.supplier_id
        AND supplier.item_name = 'SomeName'
WHERE 
    store.price > 500 
    AND store.quantity BETWEEN 0 AND 100;

Use BETWEEN it's better.

Also, add indexes on :

  • store.item_id
  • item.item_id
  • supplier.item_name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM