简体   繁体   中英

Generating heat/density map with PostGIS from point data

I have a huge table (gps_points) with a geometry column storing 2D-points. What I'm trying to accomplish is to run a query that output something like

 id | freq
-------------
  1 | 365
  2 | 1092
  3 | 97
...

where "id" is a unique identifier of a small rectangle inside my total bounding box and "freq" is the number of points that fall inside that particular rectangle.

So I have defined a PostGIS table as:

create table sub_rects (
id int,
geom geometry)

I then run a script externally, where I generate 1000x1000 such rectangles and create polygons of them, so I get a million lines like this:

insert into sub_rects values(1,ST_GeomFromText('POLYGON((1.1 1.2, 1.1 1.4, 1.5 1.4, 1.5 1.2, 1.1 1.2))'));

except of course every polygon gets itself a new set of co-ordinates to match its actual place in the 1000x1000 grid over the bounding box co-ordinates of my gps data, and the ID gets updated for each tuple.

Then I generate a spatial index and a primary key index on this table.

Finally I can run this table and my original data table (gps_points) with

select id, count(*) from sub_rects r join gps_points g on r.geom && g.geom group by id;

which gives me my sought output. The problem is that it takes forever to load all the little polygons and that every time I want to generate a map with a different number of rectangles or running over a data set with different underlying co-ordinates, I have to drop sub_rects and generate and load it anew.

Is there a better way of doing this? I don't need graphic output. I just need to generate the data. Not having to generate the support table (sub_rects) externally would be very nice, and I suspect there are way less computationally expensive methods of accomplishing the same thing. I would much prefer not to have to install any additional software.

ETA: As per request in comments, here is the query plan (on my home machine, so smaller data sets and other table names, but the same plan):

gisdb=# explain analyse select g.id id, count(*) from gridrect g join broadcast b on g.geom && b.wkb_geometry group by g.id;

    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.57..177993.58 rows=10101 width=12) (actual time=14.740..3528.600 rows=1962 loops=1)
   Group Key: g.id
   ->  Nested Loop  (cost=0.57..144786.36 rows=6621242 width=4) (actual time=13.948..3050.741 rows=1366376 loops=1)
         ->  Index Scan using gridrect_id_idx on gridrect g  (cost=0.29..485.30 rows=10201 width=124) (actual time=0.079..6.582 rows=10201 loops=1)
         ->  Index Scan using broadcast_wkb_geometry_geom_idx on broadcast b  (cost=0.29..12.78 rows=137 width=32) (actual time=0.011..0.217 rows=134 loops=10201)
               Index Cond: (g.geom && wkb_geometry)
 Planning time: 0.591 ms
 Execution time: 3529.320 ms
(8 rows)

ETA 2:

As per suggestions in the answers I modified the code suggested there to this:

(SELECT row_number() OVER (ORDER BY geom) id, geom
 FROM (SELECT st_geomfromtext(
                  concat('Polygon((', x || ' ' || y, ',', x + xstep || ' ' || y, ',', x + xstep || ' ' || y + ystep,
                         ',', x || ' ' || y + ystep, ',', x || ' ' || y, '))')) geom
       FROM (SELECT x, y
             FROM (SELECT generate_series(xmin, xmin + xdelta, xstep) x) x,
                  (SELECT generate_series(ymin, ymin + ydelta, ystep) y) y) foo) bar);

where xmin, ymin, xdelta, ydelta, xstep and ystep are all calculated by an external script, but might just as well be calculated as a part of a Postgres function if you wrapped the above in a function call. Generating a temporary table from this and running the queries against that is two orders of magnitude faster than what I was doing initially.

Two things. First create table on sql level (from pg_admin for exmaple).

create table polygons as
select st_geomfromtext(concat('Polygon((',x||' '||y,',',x||' 
'||y+0.2,',',x+0.4||' '||y+0.2,',',x+0.4||' '||y,',',x||' '||y,'))')) geom
  FROM (select generate_series(0,199.9,0.2) x) x,
       (select generate_series(0,199.9,0.4) y) y

Create index

create index on polygons using gist(geom);

Then use your query or this one. Check which one will be faster in your case

select id, count(*) 
  from sub_rects r 
  join gps_points g on st_dwithin(r.geom, p.geom, 0)
group by id;

Here's an example of generating a grid from a bounding box:

https://gis.stackexchange.com/questions/16374/how-to-create-a-regular-polygon-grid-in-postgis

To generate the density data, try creating a temp table with all the data first and then get the count. In my experience the below was somewhat faster than combing all into a single query:

create temp table rect_points as 
select r.id as rect_id, p.id as point_id 
from sub_rects r, gps_points p
where p.geom && r.geom;

create index idx on rect_points (rect_id);

select rect_id, count(*) from rect_points group by rect_id; 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM