[英]PostgreSQL simple query optimization
PostgreSQL 8.4; PostgreSQL 8.4; Three tables - store (~100k, pk id, fk supplier_id & item_id), supplier(~10 pk supplier_id), item(~1000 pk item_id);
三个表 - 商店(~100k,pk id,fk supplier_id&item_id),供应商(~10 pk supplier_id),item(~1000 pk item_id);
I created the following query to get the data I need: 我创建了以下查询以获取我需要的数据:
SELECT store.quantity, store.price, x.supplier_name
FROM store NATURAL JOIN
(SELECT * FROM item NATURAL JOIN supplier) AS x
WHERE store.price > 500 AND store.quantity > 0 AND
store.quantity < 100 AND
x.item_name = 'SomeName';
The query plan: 查询计划:
Nested Loop (cost=20.76..6513.55 rows=8 width=229)
-> Hash Join (cost=20.76..6511.30 rows=8 width=15)
Hash Cond: (store.item_id = item.item_id)
-> Seq Scan on store (cost=0.00..6459.00 rows=8388 width=23)
Filter: ((price > 500::numeric) AND (quantity > 0) AND (quantity < 100))
-> Hash (cost=20.75..20.75 rows=1 width=8)
-> Seq Scan on item (cost=0.00..20.75 rows=1 width=8)
Filter: ((item_name)::text = 'SomeName'::text)
-> Index Scan using supplier_pkey on supplier (cost=0.00..0.27 rows=1 width=222)
Index Cond: (supplier.supplier_id = store.supplier_id)
Now the aim is to reduce the cost by more than 30% by optimizing the query itself. 现在的目标是通过优化查询本身来降低成本30%以上。 The only instances of this problem I found were solved by modifying the table or the server settings, but I am looking to do this by modifying nothing else than the query and that's where I fell short in research.
我发现的这个问题的唯一实例是通过修改表或服务器设置来解决的,但我希望通过修改除查询以外的其他内容来实现这一点,而这正是我在研究方面做得不够的地方。
Clearly the issue to be solved is the Seq Scan, which brings me to thinking I need to arrange it so that the scanning/filtering is applied only to a subset of the store table - but iirc you need to scan the table in any such case, so maybe use something else than a Seq Scan? 显然,要解决的问题是Seq Scan,这让我想到我需要安排它以便扫描/过滤仅应用于商店表的一个子集 - 但是在任何这种情况下你都需要扫描表格,所以也许使用别的东西而不是Seq Scan? Index scan isn't going to help since I wouldn't be filtering by the index... I'm puzzled here because this seems more of a choice that the PostgreSQL optimizer makes and not something I can change at will...
索引扫描没有帮助,因为我不会被索引过滤...我在这里感到困惑,因为这似乎是PostgreSQL优化器所做的更多选择,而不是我可以随意改变的东西......
(If you're wondering, this was part of an assignment and I'm asking here because I have spent quite a few hours researching the problem failing to find anything relevant, and I just gave up on it, but I'm still curious...) (如果你想知道,这是作业的一部分,我在这里问,因为我花了几个小时研究这个问题没有找到任何相关的东西,我只是放弃了它,但我仍然很好奇...)
You can probably fix this with indexes. 您可以使用索引修复此问题。 It is a little hard to tell what the keys are because of the "natural join"s.
由于“自然连接”,有点难以分辨键是什么。 (I recommend
using
instead of natural join
so you can at least see what keys are being used and if one of the tables is modified, it won't mess up the join.) (我建议
using
而不是natural join
这样你至少可以看到正在使用的键,如果其中一个表被修改,它不会搞乱连接。)
I think an index on item(item_name, item_id)
would help the query plan. 我认为
item(item_name, item_id)
上的索引可以帮助查询计划。
Will be hard to optimize because it looks nice, try this to avoid subquery : 将难以优化,因为它看起来不错,尝试这样来避免子查询:
SELECT
store.quantity,
store.price,
supplier.supplier_name
FROM store
INNER JOIN item
ON store.item_id = item.item_id
INNER JOIN supplier
ON supplier.supplier_id = store.supplier_id
AND supplier.item_name = 'SomeName'
WHERE
store.price > 500
AND store.quantity BETWEEN 0 AND 100;
Use BETWEEN
it's better. 使用
BETWEEN
会更好。
Also, add indexes on : 另外,添加索引 :
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.