如何避免 Postgis 中重叠区域的重复计数？

Question

I want to compute the impact of events in a town using Postgis.我想使用 Postgis 计算城镇中事件的影响。 I have a table with point locations (event_count_2019_geo) of the events and a table containing all buildings of the town (utrecht_2020) as well in point locations.我有一个包含事件点位置 (event_count_2019_geo) 的表和一个包含镇 (utrecht_2020) 以及点位置的所有建筑物的表。 I count all houses around the event in a range of slightly more than 200 meters and count the number of inhabited houses.我数了一下事件周围200米范围内的所有房屋，并计算了有人居住的房屋数量。 See code below.请参阅下面的代码。

-- In a range of ~200 meters
UPDATE event_count_2019_geo
SET gw200 = temp.aantal_woningen
FROM (SELECT locatie, count(event_count_2019_geo.locatie) AS aantal_woningen
      FROM event_count_2019_geo
           INNER JOIN utrecht_2020 AS bag ON (ST_DWithin(bag.geo_lokatie, event_count_2019_geo.geo_lokatie, 0.002))  
      WHERE  bag.verblijfsobjectgebruiksdoel LIKE '%woonfunctie%'
      GROUP BY locatie
     ) AS temp
WHERE event_count_2019_geo.locatie = temp.locatie;

Trouble is that I end up with way too many houses being impacted by the event.问题是我最终有太多的房屋受到该事件的影响。 I made a drawing of all ranges of 200m around each event (see picture below).我绘制了每个事件周围 200m 的所有范围（见下图）。 The overlapping areas are counted twice, thrice or event four times.重叠区域被计数两次、三次或事件四次。 The houses are counted correctly for each event but I cannot sum the results.每个事件的房屋计数正确，但我无法总结结果。 Is there a way to correct for these overlaps so that I can come at a correct total of the number of houses over all selected events?有没有办法纠正这些重叠，以便我可以在所有选定的事件中获得正确的房屋总数？

Edit: Example编辑：示例

Just a very simple example: a query of event 1 yields the houses A, B, D;只是一个非常简单的例子：事件 1 的查询产生房屋 A、B、D； event 2 = C, D, E. The count for each event is 3, their sum is 6 (which is correct behavior indeed) and what I would like to see is 5, as D is counted double.事件 2 = C、D、E。每个事件的计数为 3，它们的总和为 6（这确实是正确的行为），而我希望看到的是 5，因为 D 被计算双倍。

Answer 1

Thanks to the suggestion of @JimJones I found the solution.感谢@JimJones 的建议，我找到了解决方案。 I defined two views: one in the old way that finds all houses (find_houses_all) and the other to only return unique houses (find_houses_unique).我定义了两种视图：一种使用旧方式查找所有房屋 (find_houses_all)，另一种仅返回唯一房屋 (find_houses_unique)。

-- Find all houses within a radius of ~200m of an event
DROP VIEW IF EXISTS find_houses_all;

CREATE VIEW find_houses_all AS 
    SELECT bag.openbareruimte, bag.huisnummer, bag.huisletter, bag.huisnummertoevoeging,
           event_count_2019_geo.locatie
    FROM event_count_2019_geo
         INNER JOIN utrecht_2020 AS bag ON (ST_DWithin(bag.geo_lokatie, event_count_2019_geo.geo_lokatie, 0.002));  

-- Find all *unique* houses within a radius of ~200m of an event 
-- Each house is uniquely identiefied by openbareruimte, huisnummer, huisletter
-- and huisnummertoevoeging, so these are the columns to apply DISTINCT ON
DROP VIEW IF EXISTS find_houses_unique;

CREATE VIEW find_houses_unique AS 
    SELECT DISTINCT ON(bag.openbareruimte, bag.huisnummer, bag.huisletter, bag.huisnummertoevoeging) 
           bag.openbareruimte, bag.huisnummer, bag.huisletter, bag.huisnummertoevoeging,
           event_count_2019_geo.locatie
    FROM event_count_2019_geo
         INNER JOIN utrecht_2020 AS bag ON (ST_DWithin(bag.geo_lokatie, event_count_2019_geo.geo_lokatie, 0.002));

I ran both scripts and got indeed output as I expected.我运行了这两个脚本并且确实得到了我预期的输出。

SELECT locatie, COUNT (locatie)
FROM find_houses_all -- find_houses_unique
GROUP BY locatie
ORDER BY locatie;

The output for find_houses_all is in all cases more or equal than the output for find_houses_unique. find_houses_all 的输出在所有情况下都大于或等于 find_houses_unique 的输出。 Sample output in a spreadsheet and subtracted looks as follows:电子表格中的示例输出和减法如下：

Locatie         All Unique  All - Unique
achter st.-ptr. 617 222     395
berlijnplein    87   87       0
boothstraat     653 175     478
breedstraat    1057 564     493
buurkerkhof     914 163     751
catharijnesngl. 134  38      96
domplein        842 149     693
 ...
Total         35399 13196   22203

negative numbers would have indicated an error.负数表示错误。

Answer 2

great one of you data scientists.伟大的数据科学家之一。 I am learning!我在学习！ In this problem, as conventional statistician i would have used set theory algorithm to obtain unique counts of the impacted cases (houses) ie n(AUB) = n(A) + n(B) -n(A-intersection-B)在这个问题中，作为传统的统计学家，我会使用集合论算法来获得受影响案例（房屋）的唯一计数，即 n(AUB) = n(A) + n(B) -n(A-intersection-B)

如何避免 Postgis 中重叠区域的重复计数？

问题描述

1 个解决方案

解决方案1
1 2020-03-10 20:57:53

解决方案2
0 2021-10-20 12:18:03

如何避免 Postgis 中重叠区域的重复计数？

问题描述

1 个解决方案

解决方案1 1 2020-03-10 20:57:53

解决方案2 0 2021-10-20 12:18:03

解决方案1
1 2020-03-10 20:57:53

解决方案2
0 2021-10-20 12:18:03