The query below takes almost 15 min for the result to show up. And I am wondering why? Because of the data? Or the vertices of the geometries? When I tried the query with a different table (small sized shapefile) it runs fast.
Here's the query. (Thanks to Patrick for this):
WITH hi AS (
SELECT ps.id, ps.brgy_locat, ps.municipali
FROM evidensapp_polystructures ps
JOIN evidensapp_seniangcbr fh ON fh.hazard = 'High'
AND ST_Intersects(fh.geom, ps.geom)
), med AS (
SELECT ps.id, ps.brgy_locat, ps.municipali
FROM evidensapp_polystructures ps
JOIN evidensapp_seniangcbr fh ON fh.hazard = 'Medium'
AND ST_Intersects(fh.geom, ps.geom)
EXCEPT SELECT * FROM hi
), low AS (
SELECT ps.id, ps.brgy_locat, ps.municipali
FROM evidensapp_polystructures ps
JOIN evidensapp_seniangcbr fh ON fh.hazard = 'Low'
AND ST_Intersects(fh.geom, ps.geom)
EXCEPT SELECT * FROM hi
EXCEPT SELECT * FROM med
)
SELECT brgy_locat AS barangay, municipali AS municipality, high, medium, low
FROM (SELECT brgy_locat, municipali, count(*) AS high
FROM hi
GROUP BY 1, 2) cnt_hi
FULL JOIN (SELECT brgy_locat, municipali, count(*) AS medium
FROM med
GROUP BY 1, 2) cnt_med USING (brgy_locat, municipali)
FULL JOIN (SELECT brgy_locat, municipali, count(*) AS low
FROM low
GROUP BY 1, 2) cnt_low USING (brgy_locat, municipali);
PostgreSQL 9.3, PostGIS 2.1.5
Table Polystructures
: contains 9847 rows:
CREATE TABLE evidensapp_polystructures (
id serial NOT NULL PRIMARY KEY,
bldg_name character varying(100) NOT NULL,
bldg_type character varying(50) NOT NULL,
brgy_locat character varying(50) NOT NULL,
municipali character varying(50) NOT NULL,
province character varying(50) NOT NULL,
geom geometry(MultiPolygon,32651)
);
CREATE INDEX evidensapp_polystructures_geom_id
ON evidensapp_polystructures USING gist (geom);
ALTER TABLE evidensapp_polystructures CLUSTER ON evidensapp_polystructures_geom_id;
Table SeniangCBR
: only 6 rows, shapefile size (if it matters): 52,060 KB
CREATE TABLE evidensapp_seniangcbr (
id serial NOT NULL PRIMARY KEY,
hazard character varying(16) NOT NULL,
geom geometry(MultiPolygon,32651)
);
CREATE INDEX evidensapp_seniangcbr_geom_id ON evidensapp_seniangcbr USING gist (geom);
ALTER TABLE evidensapp_seniangcbr CLUSTER ON evidensapp_seniangcbr_geom_id;
All the data were automatically loaded into the database by using LayerMapping utility as I am using Django(GeoDjango) .
I don't have a server right now, I run the query on my PC.
The EXPLAIN ANALYZE
output is hard to read because all the fields and functions are scrambled into radio alphabet . That said, two things stand out:
ST_Intersects()
function and this is not surprising. EXCEPT
clause appears to be rather inefficient too. So please try this, rather less verbose, version:
SELECT brgy_locat AS barangay, municipali AS municipality,
sum(CASE max_hz_id WHEN 3 THEN 1 ELSE 0 END) AS high,
sum(CASE max_hz_id WHEN 2 THEN 1 ELSE 0 END) AS medium,
sum(CASE max_hz_id WHEN 1 THEN 1 ELSE 0 END) AS low
FROM (
SELECT ps.id, ps.brgy_locat, ps.municipali,
max(CASE fh.hazard WHEN 'Low' THEN 1 WHEN 'Medium' THEN 2 WHEN 'High' THEN 3 END) AS max_hz_id
FROM evidensapp_polystructures ps
JOIN evidensapp_seniangcbr fh ON ST_Intersects(fh.geom, ps.geom)
GROUP BY 1, 2, 3
) AS ps_fh
GROUP BY 1, 2;
There is now only a single call to ST_Intersects()
which is possibly (hopefully) quite a bit faster than three calls on sub-sets of the hazard map (due to internal efficiencies in the PostGIS code).
As is clear, the hazard class string is converted into a range of integers, that allow easy ordering and comparison. In the inner query, the maximum hazard value is selected, corresponding to your requirement. In the main query those maximum values per structure are summed into their respective columns. If at all possible, change your table structure to use those three integer codes and link to a helper table for the class label: your table would get smaller and therefore faster and the CASE
statement in the inner query could be dropped. Alternatively, add a column with the integer code and update values according to the "hazard" column.
Note that these CASE
statements are not very efficient (reason why I used the EXCEPT
clause in the previous answer). In PG 9.4 a new FILTER
clause on aggregate functions is introduced which would make the query faster and easier to read:
count(id) FILTER (WHERE max_hz_id = 3) AS high
You might want to consider an upgrade.
Selamat mula Maynila
Add a bounding_box geometry(Polygon,4326)
column to your table. The value of the column would be a bounding box (max x,y and min x,y of the multipolygon
) that completely encapsulates the multipolygon
.
Then your query would look like this:
AND ST_Intersects(fh.bounding_box, ps.bounding_box)
AND ST_Intersects(fh.geom, ps.geom)
The advantage of this is that the first ST_Intersects
call is pretty fast. If it returns false, the second, more involved ST_Intersects
call is never invoked, saving you some time in that case.
Similar to what I suggested and explained under your related question , I would use UNION ALL
instead of FULL JOIN
in the outer SELECT
.
WITH hi AS (
SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
FROM evidensapp_seniangcbr fh
JOIN evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom)
WHERE fh.hazard = 'High'
GROUP BY 1, 2, 3
)
, med AS (
SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
FROM evidensapp_seniangcbr fh
JOIN evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom)
LEFT JOIN hi USING (brgy_locat, municipali)
WHERE fh.hazard = 'Medium'
AND hi.brgy_locat IS NULL
GROUP BY 1, 2, 3
)
TABLE hi
UNION ALL
TABLE med
UNION ALL
SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
FROM evidensapp_seniangcbr fh
JOIN evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom)
LEFT JOIN hi USING (brgy_locat, municipali)
LEFT JOIN med USING (brgy_locat, municipali)
WHERE fh.hazard = 'Low'
AND hi.brgy_locat IS NULL
AND med.brgy_locat IS NULL
GROUP BY 1, 2, 3;
This only considers the highest hazard level for each set of rows with identical (brgy_locat, municipali)
. Only rows that actually intersect with any row of relevant hazard level in evidensapp_seniangcbr
are in the result. Also, the count only counts the rows that actually intersect. There may be more rows with the same (brgy_locat, municipali)
in evidensapp_polystructures
, just not intersecting with the same hazard level and therefore ignored.
Pick one of the standard methods to exclude rows for which you already found a match in a higher hazard level in the lower levels.
LEFT JOIN
/ IS NULL
should use the index on id
and perform very well here. Certainly faster than using EXCEPT
based on the whole row, which cannot use an index.
You do not need to add a bounding_box geometry column to your table like another answer suggested. PostGIS uses (index-backed) bounding box comparison automatically in modern versions. The PostGIS documentation:
This function call will automatically include a bounding box comparison that will make use of any indexes that are available on the geometries.
In fact, we already see index scans in the explain output you posted.
Your existing GiST index evidensapp_polystructures_geom_id
should make the query fast.
Aside: the name of the index should probably be evidensapp_polystructures_geom_idx
.
In addition, create an index on (brgy_locat, municipali)
if you don't have one, yet:
CREATE INDEX foo_idx ON evidensapp_polystructures (brgy_locat, municipali);
LATERAL
join Since you have only 6 rows in evidensapp_seniangcbr
, LATERAL
joins may be faster:
WITH hi AS (
SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
FROM evidensapp_seniangcbr fh
, LATERAL (
SELECT ps.brgy_locat, ps.municipali
FROM evidensapp_polystructures ps
WHERE ST_Intersects(fh.geom, ps.geom)
) ps
WHERE fh.hazard = 'High'
GROUP BY 1, 2, 3
)
, med AS (
SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
FROM evidensapp_seniangcbr fh
, LATERAL (
SELECT ps.brgy_locat, ps.municipali
FROM evidensapp_polystructures ps
LEFT JOIN hi USING (brgy_locat, municipali)
WHERE hi.brgy_locat IS NULL
AND ST_Intersects(fh.geom, ps.geom)
) ps
WHERE fh.hazard = 'Medium'
GROUP BY 1, 2, 3
)
TABLE hi
UNION ALL
TABLE med
UNION ALL
SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
FROM evidensapp_seniangcbr fh
, LATERAL (
SELECT ps.id, ps.brgy_locat, ps.municipali
FROM evidensapp_polystructures ps
LEFT JOIN hi USING (brgy_locat, municipali)
LEFT JOIN med USING (brgy_locat, municipali)
WHERE hi.brgy_locat IS NULL
AND med.brgy_locat IS NULL
AND ST_Intersects(fh.geom, ps.geom)
) ps
WHERE fh.hazard = 'Low'
GROUP BY 1, 2, 3;
About LATERAL
joins:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.