简体   繁体   English

是否有可能更快地获得查询?

[英]Is it possible to get query faster?

I'm new to SQL and need help I have 4 tables: 我是SQL的新手,需要帮助我有4个表:

helmet                                  arm
+------+---------+-----+--------+       +------+---------+-----+--------+
|  id  |   name  | def | weight |       |  id  |   name  | def | weight |
+------+---------+-----+--------+       +------+---------+-----+--------+
|   1  |  head1  |  5  |   2.2  |       |   1  |   arm1  |  4  |   2.7  |
|   2  |  head2  |  6  |   2.9  |       |   2  |   arm2  |  5  |   3.1  |
|   3  |  head3  |  7  |   3.5  |       |   3  |   arm3  |  2  |   1.8  |
+------+---------+-----+--------+       +------+---------+-----+--------+

body                                    leg
+------+---------+-----+--------+       +------+---------+-----+--------+
|  id  |   name  | def | weight |       |  id  |   name  | def | weight |
+------+---------+-----+--------+       +------+---------+-----+--------+
|   1  |  body1  |  10  |  5.5  |       |   1  |   leg1  |  8  |   3.5  |
|   2  |  body2  |  5   |  2.4  |       |   2  |   leg2  |  5  |   2.0  |
|   3  |  body3  |  17  |  6.9  |       |   3  |   leg3  |  8  |   1.8  |
+------+---------+-----+--------+       +------+---------+-----+--------+`

I'm looking for the highest totaldef which totalweight <= input 我正在寻找总重量<=输入的最高总保额
like this: totalweight <= 10 像这样:总重量<= 10

Query: 查询:

select 
    helmet.name as hname, body.name as bname, 
    arm.name as aname, leg.name as lname,
    helmet.poise + body.poise + arm.poise + leg.poise as totalpoise, 
    helmet.weight + body.weight + arm.weight + leg.weight as totalweight 
from 
    helmet 
inner join 
    body on 1=1
inner join 
    arm on 1=1
inner join 
    leg on 1=1 
where 
    helmet.weight + body.weight + arm.weight + leg.weight <= 10
order by 
    totalpoise desc 
limit 5

Result: 结果:

+-------+-------+-------+-------+----------+-------------+
| hname | bname | aname | lname | totaldef | totalweight |
+-------+-------+------ +-------+----------+-------------+
| head2 | body2 |  arm1 |  leg3 |    23    |     9.8     |
| head1 | body2 |  arm2 |  leg3 |    23    |     9.5     |
| head3 | body2 |  arm3 |  leg3 |    22    |     9.5     |
| head1 | body2 |  arm1 |  leg3 |    22    |     9.1     |
| head2 | body2 |  arm3 |  leg3 |    21    |     8.9     |
+-------+-------+-------+-------+----------+-------------+

The problem is each table has about 100 rows so the possible results are 100m+ rows. 问题是每个表有大约100行,所以可能的结果是100米+行。 It's take a long time for query. 查询需要很长时间。 I'm not sure it's about my hardware or types of database or query. 我不确定这是关于我的硬件或数据库或查询的类型。

PS: I use HDD and have 8GB of ram. PS:我使用硬盘驱动器并拥有8GB内存。 I had tested on MySQL and PostgreSQL. 我曾在MySQL和PostgreSQL上测试过。

Update I didn't create Index yet. 更新我还没有创建索引。

Is this explain plan? 这是解释计划吗? explain plan 解释计划

How long does it take? 多久时间? It depends on input. 这取决于输入。 On MySQL it's about few minutes - couple of hours. 在MySQL上,它只需几分钟 - 几个小时。
On PostgreSQL it's take about 30 seconds - 2 minutes. 在PostgreSQL上大约需要30秒--2分钟。

Update 2 My tables never change. 更新2我的表永远不会改变。 So can I store all result in a table? 那么我可以将所有结果存储在一个表中吗? Does it help? 有帮助吗?

Update 3 I think about partitioning. 更新3我想到分区。 It's may be much faster but the problem is if some [armor set] in lower partition has totaldef more than [armor set] in upper partition. 它可能要快得多,但问题是如果下层分区中的某些[装甲设置]在上层分区中的totaldef大于[armor set]。 example: 例:

[head1,arm1,body1,leg1][totaldef 25][totalweight 9.9]
[head2,arm2,body2,leg2][totaldef 20][totalweight 11.0]

So partition totalweight >10 gonna miss that [armor set] because it's in other partition. 因此,分区总重量> 10会错过[盔甲套装]因为它在其他分区中。

This is CSV file for anyone who want to test. 这是任何想要测试的人的CSV文件。 CSV file CSV文件

Update 4 I think the fastest way is create materialized view but I guess the key to performance is sort it. 更新4我认为最快的方法是创建物化视图,但我认为性能的关键是排序。 I don't know which sort can help materialized view or Index but I sorted them both and it's helpful. 我不知道哪种类型可以帮助物化视图或索引,但我对它们进行了排序,这很有帮助。

I didn't expect to get a lot of help like this. 我没想到会得到很多像这样的帮助。 Thank you. 谢谢。

Very interesting question. 非常有趣的问题。 I don't know any special method for your situation. 我不知道你的情况有什么特别的方法。 If I were you I will test the following : Body seems heavier than helmet, arm and leg. 如果我是你,我将测试以下内容:身体似乎比头盔,手臂和腿更重。 So I will query first on that table and then on each join and make sure the sum of the weight doesn't exceed your input. 因此,我将首先在该表上查询,然后在每个连接上进行查询,并确保权重的总和不超过您的输入。 as follow : 如下 :

SELECT helmet.name AS hname, body.name AS bname, arm.name AS aname, leg.name AS lname,
helmet.poise + body.poise + arm.poise + leg.poise AS totalpoise, 
helmet.weight + body.weight + arm.weight + leg.weight AS totalweight 
FROM body 
    INNER JOIN helmet 
    ON 1=1 
        AND body.weight + helmet.weight <= 10
    INNER JOIN arm 
    ON 1=1 
        AND body.weight + helmet.weight + arm.weight <= 10
    INNER JOIN leg 
    ON 1=1 
        AND body.weight + helmet.weight + arm.weight + leg.weight <= 10
WHERE body.weight <= 10
ORDER BY totalpoise DESC limit 5

Also as @juergen-d mention in a comment, indexes would have an impact on the performance. 同样,@ juergen-d在评论中提到,索引会对性能产生影响。 You could benchmark the difference with or without indexes on each weight columns. 您可以在每个权重列上使用或不使用索引来区分差异。

For PostgreSQL : 对于PostgreSQL:

CREATE INDEX index_body_on_weight ON body(weight);

After some discussion with zerkms and Laurenz Albe they agree to say those three indexes are useless and should not be used : (If I have time I 'll do a benchmark) 在与zerkms和Laurenz Albe进行一些讨论后,他们同意说这三个索引是无用的,不应该使用 :(如果我有时间,我会做一个基准测试)

CREATE INDEX index_helmet_on_weight ON helmet(weight);
CREATE INDEX index_arm_on_weight ON arm(weight);
CREATE INDEX index_leg_on_weight ON leg(weight);

Benchmark on PostgreQSL 9.3.5 : PostgreQSL 9.3.5的基准测试:

 slowbs's Query : 107.628 second
 my proposition Query : 12.066 second
 my proposition Query : 16.257 second (with only index_body_on_weight)
 my proposition Query : 13.217 second (with 4 indexes)

Benchmark's Conclusion : indexes in that case are inefficient. 基准测试结论:在这种情况下,索引效率低下。 @zerkms and @Laurenz Albe were right. @zerkms和@Laurenz Albe是对的。

Last but not least, please share your results. 最后但并非最不重要的,请分享您的结果。

A materialized view with the appropriate index performs reasonably well, 1.8 sec in my aging SSD desktop with the stock Postgresql config: 具有适当索引的物化视图表现相当不错,在我老化的SSD桌面上使用Postgresql配置库存1.8秒:

create materialized view v as
select
    h.name as hname, b.name as bname, a.name as aname, l.name as lname,
    total_poise, total_weight
from
    helmet h
    cross join
    body b
    cross join
    arm a
    cross join
    leg l
    cross join lateral (
        select
            h.weight + b.weight + l.weight + a.weight as total_weight,
            h.poise + b.poise + l.poise + a.poise as total_poise
    ) total
order by total_poise desc, total_weight
;

create index v_index on v (total_poise desc, total_weight);

Execution and analyze: 执行和分析:

select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
         hname         |          bname           |         aname          |          lname           | total_poise | total_weight 
-----------------------+--------------------------+------------------------+--------------------------+-------------+--------------
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.4
 Fume Sorcerer Mask+10 | Lion Warrior Cape+10     | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.5
 Fume Sorcerer Mask+10 | Red Lion Warrior Cape+10 | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.5
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Lion Warrior Skirt+10    |          20 |          9.6
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Moon Butterfly Skirt+10  |          20 |          9.6


explain analyze
select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..11.71 rows=5 width=88) (actual time=1847.680..1847.694 rows=5 loops=1)
   ->  Index Scan using v_index on v  (cost=0.57..11191615.70 rows=5020071 width=88) (actual time=1847.678..1847.691 rows=5 loops=1)
         Index Cond: (total_weight <= '10'::double precision)
 Planning time: 0.126 ms
 Execution time: 1847.722 ms

Because your table never change then you can cache the intermediate data. 因为您的表永远不会更改,所以您可以缓存中间数据。 For PostgreSQL it could be materialized view : 对于PostgreSQL,它可以是materialized view

create materialized view equipments as
  select
    h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
    (h.def+a.def+b.def+l.def) as total_def,
    (h.weight+a.weight+b.weight+l.weight) as total_weight
  from helmet as h, arm as a, body as b, leg as l;
create index i_def on equipments(total_def);
create index i_weight on equipments(total_weight);

It is one-time heavy operation, but after that queries like: 它是一次性繁重的操作,但在那之后的查询如下:

select *
from equipments
where total_weight <= 10
order by total_def desc
limit 5;

will be much faster. 会更快。 And of course you can to join your tables to the query above to get details about equipments. 当然,您可以将表格加入上面的查询以获取有关设备的详细信息。

And you can to call REFRESH MATERIALIZED VIEW if tables was changed. 如果表格被更改,您可以调用REFRESH MATERIALIZED VIEW

I am not familiar with MySQL, but you can to google for mysql materialized view or just to use a regular table. 我不熟悉MySQL,但你可以google for mysql materialized view或只是使用常规表。


Yet another attempt: partitioning . 又一次尝试: 分区

( drop materialized view equipments if it was created in the previous attempt) (如果是在之前的尝试中创建的,则drop materialized view equipments

create table equipments(
  helmet_id int, arm_id int, body_id int, leg_id int,
  total_weight float, total_def float);

There is the basic table. 有基本表。 Next we will create partitions. 接下来我们将创建分区。 For example, if there is max total weight is 40 then there are four partitions for 0-10, 10-20, 20-30 and 30-40 total weight: 例如,如果最大总重量为40,那么有4个分区用于0-10,10-20,20-30和30-40总重量:

create table equipments_10 (check (total_weight>0 and total_weight<=10))
  inherits (equipment); 
create table equipments_20 (check (total_weight>10 and total_weight<=20))
  inherits (equipment); 
create table equipments_30 (check (total_weight>20 and total_weight<=30))
  inherits (equipment); 
create table equipments_40 (check (total_weight>30))
  inherits (equipment);

Fill our tables: 填写我们的表格:

insert into equipments
  select
    h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
    (h.def+a.def+b.def+l.def) as total_def,
    (h.weight+a.weight+b.weight+l.weight) as total_weight
  from helmet as h, arm as a, body as b, leg as l;

And create a lot of indexes to give PostgreSQL chance to select the most efficient execution plan: 并创建了许多索引,使PostgreSQL有机会选择最有效的执行计划:

create index i_equip_total_def on equipments(total_def);
create index i_equip_total_weight on equipments(total_weight); 
create index i_equip_10_total_def on equipments_10(total_def);
create index i_equip_10_total_weight on equipments_10(total_weight); 
create index i_equip_20_total_def on equipments_20(total_def);
create index i_equip_20_total_weight on equipments_20(total_weight); 
create index i_equip_30_total_def on equipments_30(total_def);
create index i_equip_30_total_weight on equipments_30(total_weight); 
create index i_equip_40_total_def on equipments_40(total_def);
create index i_equip_40_total_weight on equipments_40(total_weight);

Finally compute statistics about the data: 最后计算有关数据的统计信息:

analyze equipments;
analyze equipments_10;
analyze equipments_20;
analyze equipments_30;
analyze equipments_40;

The query is similar as in the previous attempt. 该查询与之前的尝试类似。

PS: Here is my test if somebody want to try it. PS: 这是我的测试,如果有人想尝试。
PPS: On my tests each of queries, independent on parameter is less the 0.5 ms (on the my prehistoric HW). PPS:在我的测试中,每个查询,独立于参数小于0.5毫秒(在我的史前硬件上)。

Just for fun & completeness: a recursive solution on a unified table. 只是为了好玩和完整:统一表上的递归解决方案。 This may be not the fastest, but it might win if the tables get larger and an index can be used. 这可能不是最快的,但如果表变大并且可以使用索引,它可能会赢。 (trivial examples like the 3*3*3*3 will often yield hash-join plans, or even nested table scans) (像3*3*3*3这样的平凡例子通常会产生散列连接计划,甚至是嵌套的表扫描)


-- the data
CREATE TABLE helmet(id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO helmet(id, name, poise, weight) VALUES
(   1, 'head1', 5, 2.2) ,(   2, 'head2', 6, 2.9) ,(   3, 'head3', 7, 3.5) ;

CREATE TABLE body (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO body(id, name, poise, weight) VALUES
 (   1, 'body1', 10, 5.5) ,(   2, 'body2', 5 , 2.4) ,(   3, 'body3', 17, 6.9) ;

CREATE TABLE arm (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO arm(id, name, poise, weight) VALUES
 (   1, 'arm1', 4, 2.7) ,(   2, 'arm2', 5, 3.1) ,(   3, 'arm3', 2, 1.8) ;

CREATE TABLE leg (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO leg(id, name, poise, weight) VALUES
 (   1, 'leg1', 8, 3.5) ,(   2, 'leg2', 5, 2.0) ,(   3, 'leg3', 8, 1.8) ;


-- combine the four tables into one
CREATE table allgear AS
SELECT 1 AS gid, 'helmet' AS gear, h.id, h.name, h.poise, h.weight FROM helmet h
UNION ALL
SELECT 2 AS gid, 'body' AS gear, b.id, b.name, b.poise, b.weight FROM body b
UNION ALL
SELECT 3 AS gid, 'arm' AS gear, a.id, a.name, a.poise, a.weight FROM arm a
UNION ALL
SELECT 4 AS gid, 'leg' AS gear, l.id, l.name, l.poise, l.weight FROM leg l
        ;

-- add som structure ...
ALTER TABLE allgear ADD PRIMARY KEY(gid, id);
CREATE INDEX ON allgear(gid, weight);
VACUUM ANALYZE allgear;

-- SELECT * FROM allgear ORDER by gid, id;


-- Recursive query with some pruning on the partial results.
-- EXPLAIN ANALYZE
WITH recursive rrr AS (
        SELECT gid AS gid
                , ARRAY[ name] AS arr
                , poise AS totpoise
                , weight AS totweight
        FROM allgear
        WHERE gid = 1
        UNION ALL
        SELECT ag.gid
                , rrr.arr || ARRAY[ag.name] AS arr
                , rrr.totpoise +ag.poise AS totpoise
                , (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
        FROM allgear ag
        JOIN rrr ON ag.gid = rrr.gid +1 AND (rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)
        )
SELECT * FROM rrr
WHERE gid = 4 -- the gid of the final one
ORDER BY totweight DESC
LIMIT 5
        ;

Result: 结果:

 gid |           arr           | totpoise | totweight 
-----+-------------------------+----------+-----------
   4 | {head2,body2,arm1,leg2} |       20 |     10.00
   4 | {head1,body2,arm3,leg1} |       20 |      9.90
   4 | {head2,body2,arm1,leg3} |       23 |      9.80
   4 | {head3,body2,arm3,leg2} |       19 |      9.70
   4 | {head1,body2,arm2,leg2} |       20 |      9.70
(5 rows)

Note: I get a few more combinations, probably because I used DECIMAL(4,2) instead of a floating point type. 注意:我得到了一些组合,可能是因为我使用了DECIMAL(4,2)而不是浮点类型。


Extra: we can add some extra pruning (even in the lower levels) if we know what the minimum weight is that the remaining levels (gear-types) will add. 额外:如果我们知道剩余的水平(齿轮类型)将增加的最小重量,我们可以添加一些额外的修剪(即使在较低的水平)。 I added an extra table for this. 我为此添加了一个额外的表。


CREATE TABLE minima AS
SELECT gid, MIN(weight) AS mimi
FROM allgear
GROUP BY gid;
-- add an extra level ...
INSERT INTO minima(gid, mimi) VALUES (5, 0.0);

-- EXPLAIN ANALYZE
WITH recursive rrr AS (
        SELECT gid AS gid
                , ARRAY[ name] AS arr
                , poise AS totpoise
                , weight AS totweight
        FROM allgear
        WHERE gid = 1
        UNION ALL
        SELECT ag.gid
                , rrr.arr || ARRAY[ag.name] AS arr
                , rrr.totpoise +ag.poise AS totpoise
                , (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
        FROM allgear ag
        JOIN rrr ON ag.gid = rrr.gid+1
        -- Do some extra pruning: Partial sum + the missing parts should not sum up to more than 10
        JOIN LATERAL ( SELECT SUM(mimi) AS debt
                FROM minima
                WHERE gid > ag.gid
                ) susu ON (susu.debt +rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)

        )
SELECT * FROM rrr
WHERE gid = 4
ORDER BY totweight DESC
LIMIT 5
        ;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM