I am trying to fix a performance problem that I am having with a query in PostgreSQL.
General concept being modeled: Software licences being purchased and assigned. I think I have removed enough of the other things being modeled that it is now very similar to a standard hotel room booking system, except that it is normal for a current assignment (hotel booking) to have no known end date.
Query's purpose: This is a view that assembles the information necessary to display information about a licence, and the purchase it came from. When the application queries the view, it provides a tLicence.id
, so that one row is returned.
Non-hotel-like concepts remaining in the query:
The nested SELECT
with the alias purchase_quantities_assignnments
is a view in the database (inlined here for convenience). Ideally, I would like any fix to my performance issue to not require that a modified version of this view be inlined into the query; ideally, the view can continue to exist mostly as-is, and be used in other ways in other queries.
Problem
If I query this view (the query) with WHERE tLicences.id = 19
, the result takes a long time to generate. In particular, it seems to be generating the entire set for periodsOfAvailability_start
(which is slow), and then joining; this conclusion is based on the EXPLAIN ANALYZE GroupAggregate
returning 10 rows (which is the number of purchases). I feel like query planner should be able to figure out that the tAssignments.purchase_id
could be used to significantly reduce how much of periodsOfAvailability_start
needs to be generated.
However, if I query this view (the query) with WHERE tLicences.id = 19 AND tLicences.purchase_id = ?
[ ?
being that licence's purchase ID], then the query runs as expected, generating only the set for periodsOfAvailability_start
that has that purchase ID; this conclusion is based on the EXPLAIN ANALYZE GroupAggregate
returning 1 row (which is the number of purchases to which the licence belongs).
Query
SELECT *
FROM test.tPurchases AS tPurchases
INNER JOIN test.tLicences
ON tLicences.purchase_id = tPurchases.id
LEFT JOIN (
SELECT
purchase_id,
SUM(
CASE
assignment_newer_id IS NOT null
WHEN true THEN 1
WHEN false THEN 0
END
) AS prchs_quantity_assigned,
SUM(
CASE
assignment_newer_id IS null AND
current_timestamp
BETWEEN licence_availability_start AND
licence_availability_end
WHEN true THEN 1
WHEN false THEN 0
END
) AS prchs_quantity_notAssignedAndCanBeAssigned,
SUM(
CASE
assignment_newer_id IS null AND
current_timestamp < licence_availability_start
WHEN true THEN 1
WHEN false THEN 0
END
) AS prchs_quantity_notAssignedAndCannotBeAssigned
FROM (
SELECT
tPurchases.id AS purchase_id,
tPurchases.date_ AS purchase_date,
tLicences.id AS licence_id,
GREATEST(
tPurchases.date_,
older.end_,
older.start + '1 day'::interval
) AS licence_availability_start,
CASE
WHEN newer.id IS NULL THEN 'infinity'
ELSE newer.start - '1 day'::interval
END AS licence_availability_end,
COALESCE(newer.start, 'infinity') AS licence_availability_uninstallBy,
older.id AS assignment_older_id,
older.start AS assignment_older_start,
older.end_ AS assignment_older_end,
newer.id AS assignment_newer_id,
newer.start AS assignment_newer_start,
newer.end_ AS assignment_newer_end
FROM test.tLicences
INNER JOIN test.tPurchases
ON tPurchases.id = tLicences.purchase_id
LEFT JOIN test.tAssignments AS older
ON (
NOT older.deleted AND
older.licence_id = tLicences.id
)
LEFT JOIN test.tAssignments AS newer
ON (
NOT newer.deleted AND
newer.id <> older.id AND
newer.licence_id = older.licence_id
)
WHERE
NOT tLicences.deleted
UNION
SELECT
tPurchases.id AS purchase_id,
tPurchases.date_ AS purchase_date,
tLicences.id AS licence_id,
tPurchases.date_ AS licence_availability_start,
oldest.start - '1 day'::interval AS licence_availability_end,
oldest.start AS licence_availability_uninstallBy,
null AS assignment_older_id,
null AS assignment_older_start,
null AS assignment_older_end,
oldest.id AS assignment_newer_id,
oldest.start AS assignment_newer_start,
oldest.end_ AS assignment_newer_end
FROM test.tLicences
INNER JOIN test.tPurchases
ON tPurchases.id = tLicences.purchase_id
INNER JOIN test.tAssignments AS oldest
ON oldest.licence_id = tLicences.id
WHERE
NOT tLicences.deleted AND
NOT oldest.deleted
) AS periodsOfAvailability_start
WHERE
(assignment_newer_id IS null OR assignment_newer_end IS null)
GROUP BY purchase_id
) AS purchase_quantities_assignnments
ON
purchase_quantities_assignnments.purchase_id = tPurchases.id
WHERE
tLicences.id = 19 /* [Unexpected behaviour] The full set for "purchase_quantities_assignnments" is generated */
--tLicences.id = 19 AND tLicences.purchase_id = ? /* [Expected behaviour] Only the single relevant row for "purchase_quantities_assignnments" appears to be generated */
--tLicences.id = 19 AND tPurchases.id = ? /* [Expected behaviour] Only the single relevant row for "purchase_quantities_assignnments" appears to be generated */
--tLicences.purchase_id = ? /* [Expected behaviour] Only the single relevant row for "purchase_quantities_assignnments" appears to be generated. Note: This is a different query *result* than the others */
Question: Is there some way that I can resolve this issue, without having to provide tLicences.purchase_id
?
Database version: PostgreSQL 9.0
SQL to generate a schema, tables, and populate those tables:
This is kind of long-running, since I wanted a quantity similar to our actual data. The number of licences (30000) and the number of assignments (100000) can be reduced if the run-time is a problem.
CREATE SCHEMA test;
CREATE TABLE test.tPurchases (
id serial not null,
date_ date not null,
/* … */
deleted boolean not null DEFAULT false,
PRIMARY KEY (id)
);
CREATE TABLE test.tLicences (
id serial not null,
purchase_id integer not null,
/* … */
deleted boolean not null DEFAULT false,
PRIMARY KEY (id),
FOREIGN KEY (purchase_id)
REFERENCES test.tPurchases (id)
ON UPDATE RESTRICT
ON DELETE RESTRICT
);
CREATE INDEX ON test.tLicences(purchase_id);
CREATE TABLE test.tAssignments (
id serial not null,
licence_id integer not null,
start date not null,
end_ date,
/* … */
deleted boolean not null DEFAULT false,
PRIMARY KEY (id),
FOREIGN KEY (licence_id)
REFERENCES test.tLicences (id)
ON UPDATE RESTRICT
ON DELETE RESTRICT,
CHECK (start <= end_)
);
CREATE INDEX ON test.tAssignments(licence_id);
INSERT INTO test.tPurchases(id, date_)
SELECT
id,
'2000-01-01'::timestamp + random() * '1 year'::interval AS date_
FROM generate_series(1, 10) AS id
;
INSERT INTO test.tLicences(id, purchase_id, deleted)
SELECT
id,
trunc(random() * 10 + 1) AS purchase_id,
(random() > 0.99) AS deleted
FROM generate_series(1, 30000) AS id
;
INSERT INTO test.tAssignments(id, licence_id, start, end_, deleted)
SELECT
assignments.id,
assignments.licence_id,
tPurchases.date_ + ((rank * 20 + random() * 10) || ' days')::interval AS start,
CASE
assignments.rank = max(assignments.rank) OVER (PARTITION BY assignments.licence_id) AND
random() > 0.5
WHEN true THEN null
ELSE tPurchases.date_ + ((rank * 20 + 10 + random() * 10) || ' days')::interval
END AS end_,
(random() > 0.95) AS deleted
FROM (
SELECT
assignments.id,
assignments.licence_id,
rank() OVER (PARTITION BY assignments.licence_id ORDER BY assignments.id) AS rank
FROM (
SELECT
id,
trunc(random() * 30000 + 1) AS licence_id
FROM generate_series(1, 100000) AS id
) AS assignments
) AS assignments
INNER JOIN test.tLicences
ON tLicences.id = assignments.licence_id
INNER JOIN test.tPurchases
ON tPurchases.id = tLicences.purchase_id
;
You might need to run statistics but often you can force the optimization you want with a CTE. Here I also take out all your sub-queries to CTEs just to make it clear:
WITH myPurchases AS
(
SELECT *
FROM test.tPurchases AS tPurchases
WHERE tLicences.id = 19
), periodsOfAvailability_start AS
(
SELECT
tPurchases.id AS purchase_id,
tPurchases.date_ AS purchase_date,
tLicences.id AS licence_id,
GREATEST(tPurchases.date_, older.end_, older.start + '1 day'::interval) AS licence_availability_start,
CASE WHEN newer.id IS NULL THEN 'infinity' ELSE newer.start - '1 day'::interval END AS licence_availability_end,
COALESCE(newer.start, 'infinity') AS licence_availability_uninstallBy,
older.id AS assignment_older_id,
older.start AS assignment_older_start,
older.end_ AS assignment_older_end,
newer.id AS assignment_newer_id,
newer.start AS assignment_newer_start,
newer.end_ AS assignment_newer_end
FROM test.tLicences
INNER JOIN myPurchases AS tPurchases ON tPurchases.id = tLicences.purchase_id
LEFT JOIN test.tAssignments AS older ON (NOT older.deleted AND older.licence_id = tLicences.id)
LEFT JOIN test.tAssignments AS newer ON (NOT newer.deleted AND newer.id <> older.id AND newer.licence_id = older.licence_id)
WHERE NOT tLicences.deleted
UNION
SELECT
tPurchases.id AS purchase_id,
tPurchases.date_ AS purchase_date,
tLicences.id AS licence_id,
tPurchases.date_ AS licence_availability_start,
oldest.start - '1 day'::interval AS licence_availability_end,
oldest.start AS licence_availability_uninstallBy,
null AS assignment_older_id,
null AS assignment_older_start,
null AS assignment_older_end,
oldest.id AS assignment_newer_id,
oldest.start AS assignment_newer_start,
oldest.end_ AS assignment_newer_end
FROM test.tLicences
INNER JOIN myPurchases AS tPurchases ON tPurchases.id = tLicences.purchase_id
INNER JOIN test.tAssignments AS oldest ON oldest.licence_id = tLicences.id
WHERE NOT tLicences.deleted AND NOT oldest.deleted
), purchase_quantities_assignnments AS
(
SELECT
purchase_id,
SUM(CASE WHEN assignment_newer_id IS NOT null THEN 1 ELSE 0 END) AS prchs_quantity_assigned,
SUM(CASE WHEN assignment_newer_id IS null AND current_timestamp BETWEEN licence_availability_start AND licence_availability_end THEN 1 ELSE false END) AS prchs_quantity_notAssignedAndCanBeAssigned,
SUM(CASE WHEN assignment_newer_id IS null AND current_timestamp < licence_availability_start THEN 1 ELSE 0 END) AS prchs_quantity_notAssignedAndCannotBeAssigned
FROM periodsOfAvailability_start
WHERE assignment_newer_id IS null OR assignment_newer_end IS null
GROUP BY purchase_id
)
SELECT *
FROM myPurchases AS tPurchases
INNER JOIN test.tLicences ON tLicences.purchase_id = tPurchases.id
LEFT JOIN purchase_quantities_assignnments ON purchase_quantities_assignnments.purchase_id = tPurchases.id
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.