I have a huge table (gps_points) with a geometry column storing 2D-points. What I'm trying to accomplish is to run a query that output something like
id | freq
-------------
1 | 365
2 | 1092
3 | 97
...
where "id" is a unique identifier of a small rectangle inside my total bounding box and "freq" is the number of points that fall inside that particular rectangle.
So I have defined a PostGIS table as:
create table sub_rects (
id int,
geom geometry)
I then run a script externally, where I generate 1000x1000 such rectangles and create polygons of them, so I get a million lines like this:
insert into sub_rects values(1,ST_GeomFromText('POLYGON((1.1 1.2, 1.1 1.4, 1.5 1.4, 1.5 1.2, 1.1 1.2))'));
except of course every polygon gets itself a new set of co-ordinates to match its actual place in the 1000x1000 grid over the bounding box co-ordinates of my gps data, and the ID gets updated for each tuple.
Then I generate a spatial index and a primary key index on this table.
Finally I can run this table and my original data table (gps_points) with
select id, count(*) from sub_rects r join gps_points g on r.geom && g.geom group by id;
which gives me my sought output. The problem is that it takes forever to load all the little polygons and that every time I want to generate a map with a different number of rectangles or running over a data set with different underlying co-ordinates, I have to drop sub_rects and generate and load it anew.
Is there a better way of doing this? I don't need graphic output. I just need to generate the data. Not having to generate the support table (sub_rects) externally would be very nice, and I suspect there are way less computationally expensive methods of accomplishing the same thing. I would much prefer not to have to install any additional software.
ETA: As per request in comments, here is the query plan (on my home machine, so smaller data sets and other table names, but the same plan):
gisdb=# explain analyse select g.id id, count(*) from gridrect g join broadcast b on g.geom && b.wkb_geometry group by g.id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.57..177993.58 rows=10101 width=12) (actual time=14.740..3528.600 rows=1962 loops=1)
Group Key: g.id
-> Nested Loop (cost=0.57..144786.36 rows=6621242 width=4) (actual time=13.948..3050.741 rows=1366376 loops=1)
-> Index Scan using gridrect_id_idx on gridrect g (cost=0.29..485.30 rows=10201 width=124) (actual time=0.079..6.582 rows=10201 loops=1)
-> Index Scan using broadcast_wkb_geometry_geom_idx on broadcast b (cost=0.29..12.78 rows=137 width=32) (actual time=0.011..0.217 rows=134 loops=10201)
Index Cond: (g.geom && wkb_geometry)
Planning time: 0.591 ms
Execution time: 3529.320 ms
(8 rows)
ETA 2:
As per suggestions in the answers I modified the code suggested there to this:
(SELECT row_number() OVER (ORDER BY geom) id, geom
FROM (SELECT st_geomfromtext(
concat('Polygon((', x || ' ' || y, ',', x + xstep || ' ' || y, ',', x + xstep || ' ' || y + ystep,
',', x || ' ' || y + ystep, ',', x || ' ' || y, '))')) geom
FROM (SELECT x, y
FROM (SELECT generate_series(xmin, xmin + xdelta, xstep) x) x,
(SELECT generate_series(ymin, ymin + ydelta, ystep) y) y) foo) bar);
where xmin, ymin, xdelta, ydelta, xstep and ystep are all calculated by an external script, but might just as well be calculated as a part of a Postgres function if you wrapped the above in a function call. Generating a temporary table from this and running the queries against that is two orders of magnitude faster than what I was doing initially.
Two things. First create table on sql level (from pg_admin for exmaple).
create table polygons as
select st_geomfromtext(concat('Polygon((',x||' '||y,',',x||'
'||y+0.2,',',x+0.4||' '||y+0.2,',',x+0.4||' '||y,',',x||' '||y,'))')) geom
FROM (select generate_series(0,199.9,0.2) x) x,
(select generate_series(0,199.9,0.4) y) y
Create index
create index on polygons using gist(geom);
Then use your query or this one. Check which one will be faster in your case
select id, count(*)
from sub_rects r
join gps_points g on st_dwithin(r.geom, p.geom, 0)
group by id;
Here's an example of generating a grid from a bounding box:
https://gis.stackexchange.com/questions/16374/how-to-create-a-regular-polygon-grid-in-postgis
To generate the density data, try creating a temp table with all the data first and then get the count. In my experience the below was somewhat faster than combing all into a single query:
create temp table rect_points as
select r.id as rect_id, p.id as point_id
from sub_rects r, gps_points p
where p.geom && r.geom;
create index idx on rect_points (rect_id);
select rect_id, count(*) from rect_points group by rect_id;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.