I want to compute the impact of events in a town using Postgis. I have a table with point locations (event_count_2019_geo) of the events and a table containing all buildings of the town (utrecht_2020) as well in point locations. I count all houses around the event in a range of slightly more than 200 meters and count the number of inhabited houses. See code below.
-- In a range of ~200 meters
UPDATE event_count_2019_geo
SET gw200 = temp.aantal_woningen
FROM (SELECT locatie, count(event_count_2019_geo.locatie) AS aantal_woningen
FROM event_count_2019_geo
INNER JOIN utrecht_2020 AS bag ON (ST_DWithin(bag.geo_lokatie, event_count_2019_geo.geo_lokatie, 0.002))
WHERE bag.verblijfsobjectgebruiksdoel LIKE '%woonfunctie%'
GROUP BY locatie
) AS temp
WHERE event_count_2019_geo.locatie = temp.locatie;
Trouble is that I end up with way too many houses being impacted by the event. I made a drawing of all ranges of 200m around each event (see picture below). The overlapping areas are counted twice, thrice or event four times. The houses are counted correctly for each event but I cannot sum the results. Is there a way to correct for these overlaps so that I can come at a correct total of the number of houses over all selected events?
Edit: Example
Just a very simple example: a query of event 1 yields the houses A, B, D; event 2 = C, D, E. The count for each event is 3, their sum is 6 (which is correct behavior indeed) and what I would like to see is 5, as D is counted double.
Thanks to the suggestion of @JimJones I found the solution. I defined two views: one in the old way that finds all houses (find_houses_all) and the other to only return unique houses (find_houses_unique).
-- Find all houses within a radius of ~200m of an event
DROP VIEW IF EXISTS find_houses_all;
CREATE VIEW find_houses_all AS
SELECT bag.openbareruimte, bag.huisnummer, bag.huisletter, bag.huisnummertoevoeging,
event_count_2019_geo.locatie
FROM event_count_2019_geo
INNER JOIN utrecht_2020 AS bag ON (ST_DWithin(bag.geo_lokatie, event_count_2019_geo.geo_lokatie, 0.002));
-- Find all *unique* houses within a radius of ~200m of an event
-- Each house is uniquely identiefied by openbareruimte, huisnummer, huisletter
-- and huisnummertoevoeging, so these are the columns to apply DISTINCT ON
DROP VIEW IF EXISTS find_houses_unique;
CREATE VIEW find_houses_unique AS
SELECT DISTINCT ON(bag.openbareruimte, bag.huisnummer, bag.huisletter, bag.huisnummertoevoeging)
bag.openbareruimte, bag.huisnummer, bag.huisletter, bag.huisnummertoevoeging,
event_count_2019_geo.locatie
FROM event_count_2019_geo
INNER JOIN utrecht_2020 AS bag ON (ST_DWithin(bag.geo_lokatie, event_count_2019_geo.geo_lokatie, 0.002));
I ran both scripts and got indeed output as I expected.
SELECT locatie, COUNT (locatie)
FROM find_houses_all -- find_houses_unique
GROUP BY locatie
ORDER BY locatie;
The output for find_houses_all is in all cases more or equal than the output for find_houses_unique. Sample output in a spreadsheet and subtracted looks as follows:
Locatie All Unique All - Unique
achter st.-ptr. 617 222 395
berlijnplein 87 87 0
boothstraat 653 175 478
breedstraat 1057 564 493
buurkerkhof 914 163 751
catharijnesngl. 134 38 96
domplein 842 149 693
...
Total 35399 13196 22203
negative numbers would have indicated an error.
great one of you data scientists. I am learning! In this problem, as conventional statistician i would have used set theory algorithm to obtain unique counts of the impacted cases (houses) ie n(AUB) = n(A) + n(B) -n(A-intersection-B)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.