简体   繁体   English

PostgreSQL简单查询优化

[英]PostgreSQL simple query optimization

PostgreSQL 8.4; PostgreSQL 8.4; Three tables - store (~100k, pk id, fk supplier_id & item_id), supplier(~10 pk supplier_id), item(~1000 pk item_id); 三个表 - 商店(~100k,pk id,fk supplier_id&item_id),供应商(~10 pk supplier_id),item(~1000 pk item_id);

I created the following query to get the data I need: 我创建了以下查询以获取我需要的数据:

SELECT store.quantity, store.price, x.supplier_name
FROM store NATURAL JOIN
     (SELECT * FROM item NATURAL JOIN supplier) AS x 
 WHERE store.price > 500 AND store.quantity > 0 AND
       store.quantity < 100 AND
       x.item_name = 'SomeName';

The query plan: 查询计划:

Nested Loop  (cost=20.76..6513.55 rows=8 width=229)
  ->  Hash Join  (cost=20.76..6511.30 rows=8 width=15)
        Hash Cond: (store.item_id = item.item_id)
        ->  Seq Scan on store  (cost=0.00..6459.00 rows=8388 width=23)
              Filter: ((price > 500::numeric) AND (quantity > 0) AND (quantity < 100))
        ->  Hash  (cost=20.75..20.75 rows=1 width=8)
              ->  Seq Scan on item  (cost=0.00..20.75 rows=1 width=8)
                    Filter: ((item_name)::text = 'SomeName'::text)
  ->  Index Scan using supplier_pkey on supplier  (cost=0.00..0.27 rows=1 width=222)
        Index Cond: (supplier.supplier_id = store.supplier_id)

Now the aim is to reduce the cost by more than 30% by optimizing the query itself. 现在的目标是通过优化查询本身来降低成本30%以上。 The only instances of this problem I found were solved by modifying the table or the server settings, but I am looking to do this by modifying nothing else than the query and that's where I fell short in research. 我发现的这个问题的唯一实例是通过修改表或服务器设置来解决的,但我希望通过修改除查询以外的其他内容来实现这一点,而这正是我在研究方面做得不够的地方。

Clearly the issue to be solved is the Seq Scan, which brings me to thinking I need to arrange it so that the scanning/filtering is applied only to a subset of the store table - but iirc you need to scan the table in any such case, so maybe use something else than a Seq Scan? 显然,要解决的问题是Seq Scan,这让我想到我需要安排它以便扫描/过滤仅应用于商店表的一个子集 - 但是在任何这种情况下你都需要扫描表格,所以也许使用别的东西而不是Seq Scan? Index scan isn't going to help since I wouldn't be filtering by the index... I'm puzzled here because this seems more of a choice that the PostgreSQL optimizer makes and not something I can change at will... 索引扫描没有帮助,因为我不会被索引过滤...我在这里感到困惑,因为这似乎是PostgreSQL优化器所做的更多选择,而不是我可以随意改变的东西......

(If you're wondering, this was part of an assignment and I'm asking here because I have spent quite a few hours researching the problem failing to find anything relevant, and I just gave up on it, but I'm still curious...) (如果你想知道,这是作业的一部分,我在这里问,因为我花了几个小时研究这个问题没有找到任何相关的东西,我只是放弃了它,但我仍然很好奇...)

You can probably fix this with indexes. 您可以使用索引修复此问题。 It is a little hard to tell what the keys are because of the "natural join"s. 由于“自然连接”,有点难以分辨键是什么。 (I recommend using instead of natural join so you can at least see what keys are being used and if one of the tables is modified, it won't mess up the join.) (我建议using而不是natural join这样你至少可以看到正在使用的键,如果其中一个表被修改,它不会搞乱连接。)

I think an index on item(item_name, item_id) would help the query plan. 我认为item(item_name, item_id)上的索引可以帮助查询计划。

Will be hard to optimize because it looks nice, try this to avoid subquery : 将难以优化,因为它看起来不错,尝试这样来避免子查询:

SELECT 
    store.quantity, 
    store.price, 
    supplier.supplier_name 
FROM store 
    INNER JOIN item
        ON store.item_id = item.item_id
    INNER JOIN supplier
        ON supplier.supplier_id = store.supplier_id
        AND supplier.item_name = 'SomeName'
WHERE 
    store.price > 500 
    AND store.quantity BETWEEN 0 AND 100;

Use BETWEEN it's better. 使用BETWEEN会更好。

Also, add indexes on : 另外,添加索引

  • store.item_id store.item_id
  • item.item_id item.item_id
  • supplier.item_name supplier.item_name

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM