简体   繁体   中英

Is it possible to get query faster?

I'm new to SQL and need help I have 4 tables:

helmet                                  arm
+------+---------+-----+--------+       +------+---------+-----+--------+
|  id  |   name  | def | weight |       |  id  |   name  | def | weight |
+------+---------+-----+--------+       +------+---------+-----+--------+
|   1  |  head1  |  5  |   2.2  |       |   1  |   arm1  |  4  |   2.7  |
|   2  |  head2  |  6  |   2.9  |       |   2  |   arm2  |  5  |   3.1  |
|   3  |  head3  |  7  |   3.5  |       |   3  |   arm3  |  2  |   1.8  |
+------+---------+-----+--------+       +------+---------+-----+--------+

body                                    leg
+------+---------+-----+--------+       +------+---------+-----+--------+
|  id  |   name  | def | weight |       |  id  |   name  | def | weight |
+------+---------+-----+--------+       +------+---------+-----+--------+
|   1  |  body1  |  10  |  5.5  |       |   1  |   leg1  |  8  |   3.5  |
|   2  |  body2  |  5   |  2.4  |       |   2  |   leg2  |  5  |   2.0  |
|   3  |  body3  |  17  |  6.9  |       |   3  |   leg3  |  8  |   1.8  |
+------+---------+-----+--------+       +------+---------+-----+--------+`

I'm looking for the highest totaldef which totalweight <= input
like this: totalweight <= 10

Query:

select 
    helmet.name as hname, body.name as bname, 
    arm.name as aname, leg.name as lname,
    helmet.poise + body.poise + arm.poise + leg.poise as totalpoise, 
    helmet.weight + body.weight + arm.weight + leg.weight as totalweight 
from 
    helmet 
inner join 
    body on 1=1
inner join 
    arm on 1=1
inner join 
    leg on 1=1 
where 
    helmet.weight + body.weight + arm.weight + leg.weight <= 10
order by 
    totalpoise desc 
limit 5

Result:

+-------+-------+-------+-------+----------+-------------+
| hname | bname | aname | lname | totaldef | totalweight |
+-------+-------+------ +-------+----------+-------------+
| head2 | body2 |  arm1 |  leg3 |    23    |     9.8     |
| head1 | body2 |  arm2 |  leg3 |    23    |     9.5     |
| head3 | body2 |  arm3 |  leg3 |    22    |     9.5     |
| head1 | body2 |  arm1 |  leg3 |    22    |     9.1     |
| head2 | body2 |  arm3 |  leg3 |    21    |     8.9     |
+-------+-------+-------+-------+----------+-------------+

The problem is each table has about 100 rows so the possible results are 100m+ rows. It's take a long time for query. I'm not sure it's about my hardware or types of database or query.

PS: I use HDD and have 8GB of ram. I had tested on MySQL and PostgreSQL.

Update I didn't create Index yet.

Is this explain plan? explain plan

How long does it take? It depends on input. On MySQL it's about few minutes - couple of hours.
On PostgreSQL it's take about 30 seconds - 2 minutes.

Update 2 My tables never change. So can I store all result in a table? Does it help?

Update 3 I think about partitioning. It's may be much faster but the problem is if some [armor set] in lower partition has totaldef more than [armor set] in upper partition. example:

[head1,arm1,body1,leg1][totaldef 25][totalweight 9.9]
[head2,arm2,body2,leg2][totaldef 20][totalweight 11.0]

So partition totalweight >10 gonna miss that [armor set] because it's in other partition.

This is CSV file for anyone who want to test. CSV file

Update 4 I think the fastest way is create materialized view but I guess the key to performance is sort it. I don't know which sort can help materialized view or Index but I sorted them both and it's helpful.

I didn't expect to get a lot of help like this. Thank you.

Very interesting question. I don't know any special method for your situation. If I were you I will test the following : Body seems heavier than helmet, arm and leg. So I will query first on that table and then on each join and make sure the sum of the weight doesn't exceed your input. as follow :

SELECT helmet.name AS hname, body.name AS bname, arm.name AS aname, leg.name AS lname,
helmet.poise + body.poise + arm.poise + leg.poise AS totalpoise, 
helmet.weight + body.weight + arm.weight + leg.weight AS totalweight 
FROM body 
    INNER JOIN helmet 
    ON 1=1 
        AND body.weight + helmet.weight <= 10
    INNER JOIN arm 
    ON 1=1 
        AND body.weight + helmet.weight + arm.weight <= 10
    INNER JOIN leg 
    ON 1=1 
        AND body.weight + helmet.weight + arm.weight + leg.weight <= 10
WHERE body.weight <= 10
ORDER BY totalpoise DESC limit 5

Also as @juergen-d mention in a comment, indexes would have an impact on the performance. You could benchmark the difference with or without indexes on each weight columns.

For PostgreSQL :

CREATE INDEX index_body_on_weight ON body(weight);

After some discussion with zerkms and Laurenz Albe they agree to say those three indexes are useless and should not be used : (If I have time I 'll do a benchmark)

CREATE INDEX index_helmet_on_weight ON helmet(weight);
CREATE INDEX index_arm_on_weight ON arm(weight);
CREATE INDEX index_leg_on_weight ON leg(weight);

Benchmark on PostgreQSL 9.3.5 :

 slowbs's Query : 107.628 second
 my proposition Query : 12.066 second
 my proposition Query : 16.257 second (with only index_body_on_weight)
 my proposition Query : 13.217 second (with 4 indexes)

Benchmark's Conclusion : indexes in that case are inefficient. @zerkms and @Laurenz Albe were right.

Last but not least, please share your results.

A materialized view with the appropriate index performs reasonably well, 1.8 sec in my aging SSD desktop with the stock Postgresql config:

create materialized view v as
select
    h.name as hname, b.name as bname, a.name as aname, l.name as lname,
    total_poise, total_weight
from
    helmet h
    cross join
    body b
    cross join
    arm a
    cross join
    leg l
    cross join lateral (
        select
            h.weight + b.weight + l.weight + a.weight as total_weight,
            h.poise + b.poise + l.poise + a.poise as total_poise
    ) total
order by total_poise desc, total_weight
;

create index v_index on v (total_poise desc, total_weight);

Execution and analyze:

select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
         hname         |          bname           |         aname          |          lname           | total_poise | total_weight 
-----------------------+--------------------------+------------------------+--------------------------+-------------+--------------
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.4
 Fume Sorcerer Mask+10 | Lion Warrior Cape+10     | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.5
 Fume Sorcerer Mask+10 | Red Lion Warrior Cape+10 | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.5
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Lion Warrior Skirt+10    |          20 |          9.6
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Moon Butterfly Skirt+10  |          20 |          9.6


explain analyze
select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..11.71 rows=5 width=88) (actual time=1847.680..1847.694 rows=5 loops=1)
   ->  Index Scan using v_index on v  (cost=0.57..11191615.70 rows=5020071 width=88) (actual time=1847.678..1847.691 rows=5 loops=1)
         Index Cond: (total_weight <= '10'::double precision)
 Planning time: 0.126 ms
 Execution time: 1847.722 ms

Because your table never change then you can cache the intermediate data. For PostgreSQL it could be materialized view :

create materialized view equipments as
  select
    h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
    (h.def+a.def+b.def+l.def) as total_def,
    (h.weight+a.weight+b.weight+l.weight) as total_weight
  from helmet as h, arm as a, body as b, leg as l;
create index i_def on equipments(total_def);
create index i_weight on equipments(total_weight);

It is one-time heavy operation, but after that queries like:

select *
from equipments
where total_weight <= 10
order by total_def desc
limit 5;

will be much faster. And of course you can to join your tables to the query above to get details about equipments.

And you can to call REFRESH MATERIALIZED VIEW if tables was changed.

I am not familiar with MySQL, but you can to google for mysql materialized view or just to use a regular table.


Yet another attempt: partitioning .

( drop materialized view equipments if it was created in the previous attempt)

create table equipments(
  helmet_id int, arm_id int, body_id int, leg_id int,
  total_weight float, total_def float);

There is the basic table. Next we will create partitions. For example, if there is max total weight is 40 then there are four partitions for 0-10, 10-20, 20-30 and 30-40 total weight:

create table equipments_10 (check (total_weight>0 and total_weight<=10))
  inherits (equipment); 
create table equipments_20 (check (total_weight>10 and total_weight<=20))
  inherits (equipment); 
create table equipments_30 (check (total_weight>20 and total_weight<=30))
  inherits (equipment); 
create table equipments_40 (check (total_weight>30))
  inherits (equipment);

Fill our tables:

insert into equipments
  select
    h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
    (h.def+a.def+b.def+l.def) as total_def,
    (h.weight+a.weight+b.weight+l.weight) as total_weight
  from helmet as h, arm as a, body as b, leg as l;

And create a lot of indexes to give PostgreSQL chance to select the most efficient execution plan:

create index i_equip_total_def on equipments(total_def);
create index i_equip_total_weight on equipments(total_weight); 
create index i_equip_10_total_def on equipments_10(total_def);
create index i_equip_10_total_weight on equipments_10(total_weight); 
create index i_equip_20_total_def on equipments_20(total_def);
create index i_equip_20_total_weight on equipments_20(total_weight); 
create index i_equip_30_total_def on equipments_30(total_def);
create index i_equip_30_total_weight on equipments_30(total_weight); 
create index i_equip_40_total_def on equipments_40(total_def);
create index i_equip_40_total_weight on equipments_40(total_weight);

Finally compute statistics about the data:

analyze equipments;
analyze equipments_10;
analyze equipments_20;
analyze equipments_30;
analyze equipments_40;

The query is similar as in the previous attempt.

PS: Here is my test if somebody want to try it.
PPS: On my tests each of queries, independent on parameter is less the 0.5 ms (on the my prehistoric HW).

Just for fun & completeness: a recursive solution on a unified table. This may be not the fastest, but it might win if the tables get larger and an index can be used. (trivial examples like the 3*3*3*3 will often yield hash-join plans, or even nested table scans)


-- the data
CREATE TABLE helmet(id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO helmet(id, name, poise, weight) VALUES
(   1, 'head1', 5, 2.2) ,(   2, 'head2', 6, 2.9) ,(   3, 'head3', 7, 3.5) ;

CREATE TABLE body (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO body(id, name, poise, weight) VALUES
 (   1, 'body1', 10, 5.5) ,(   2, 'body2', 5 , 2.4) ,(   3, 'body3', 17, 6.9) ;

CREATE TABLE arm (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO arm(id, name, poise, weight) VALUES
 (   1, 'arm1', 4, 2.7) ,(   2, 'arm2', 5, 3.1) ,(   3, 'arm3', 2, 1.8) ;

CREATE TABLE leg (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO leg(id, name, poise, weight) VALUES
 (   1, 'leg1', 8, 3.5) ,(   2, 'leg2', 5, 2.0) ,(   3, 'leg3', 8, 1.8) ;


-- combine the four tables into one
CREATE table allgear AS
SELECT 1 AS gid, 'helmet' AS gear, h.id, h.name, h.poise, h.weight FROM helmet h
UNION ALL
SELECT 2 AS gid, 'body' AS gear, b.id, b.name, b.poise, b.weight FROM body b
UNION ALL
SELECT 3 AS gid, 'arm' AS gear, a.id, a.name, a.poise, a.weight FROM arm a
UNION ALL
SELECT 4 AS gid, 'leg' AS gear, l.id, l.name, l.poise, l.weight FROM leg l
        ;

-- add som structure ...
ALTER TABLE allgear ADD PRIMARY KEY(gid, id);
CREATE INDEX ON allgear(gid, weight);
VACUUM ANALYZE allgear;

-- SELECT * FROM allgear ORDER by gid, id;


-- Recursive query with some pruning on the partial results.
-- EXPLAIN ANALYZE
WITH recursive rrr AS (
        SELECT gid AS gid
                , ARRAY[ name] AS arr
                , poise AS totpoise
                , weight AS totweight
        FROM allgear
        WHERE gid = 1
        UNION ALL
        SELECT ag.gid
                , rrr.arr || ARRAY[ag.name] AS arr
                , rrr.totpoise +ag.poise AS totpoise
                , (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
        FROM allgear ag
        JOIN rrr ON ag.gid = rrr.gid +1 AND (rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)
        )
SELECT * FROM rrr
WHERE gid = 4 -- the gid of the final one
ORDER BY totweight DESC
LIMIT 5
        ;

Result:

 gid |           arr           | totpoise | totweight 
-----+-------------------------+----------+-----------
   4 | {head2,body2,arm1,leg2} |       20 |     10.00
   4 | {head1,body2,arm3,leg1} |       20 |      9.90
   4 | {head2,body2,arm1,leg3} |       23 |      9.80
   4 | {head3,body2,arm3,leg2} |       19 |      9.70
   4 | {head1,body2,arm2,leg2} |       20 |      9.70
(5 rows)

Note: I get a few more combinations, probably because I used DECIMAL(4,2) instead of a floating point type.


Extra: we can add some extra pruning (even in the lower levels) if we know what the minimum weight is that the remaining levels (gear-types) will add. I added an extra table for this.


CREATE TABLE minima AS
SELECT gid, MIN(weight) AS mimi
FROM allgear
GROUP BY gid;
-- add an extra level ...
INSERT INTO minima(gid, mimi) VALUES (5, 0.0);

-- EXPLAIN ANALYZE
WITH recursive rrr AS (
        SELECT gid AS gid
                , ARRAY[ name] AS arr
                , poise AS totpoise
                , weight AS totweight
        FROM allgear
        WHERE gid = 1
        UNION ALL
        SELECT ag.gid
                , rrr.arr || ARRAY[ag.name] AS arr
                , rrr.totpoise +ag.poise AS totpoise
                , (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
        FROM allgear ag
        JOIN rrr ON ag.gid = rrr.gid+1
        -- Do some extra pruning: Partial sum + the missing parts should not sum up to more than 10
        JOIN LATERAL ( SELECT SUM(mimi) AS debt
                FROM minima
                WHERE gid > ag.gid
                ) susu ON (susu.debt +rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)

        )
SELECT * FROM rrr
WHERE gid = 4
ORDER BY totweight DESC
LIMIT 5
        ;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM