简体   繁体   English

Postgis / SQL选择元组,以便第一个元组项目是唯一的,并且项目几何形状相交

[英]Postgis/SQL Select tuples such that the first tuple item is unique and the items geometries intersect

This question is particularly for Postgres 9.4 这个问题特别针对Postgres 9.4

Lets say I have two tables: 可以说我有两个表:

CREATE TABLE A(id INT);
CREATE TABLE B(id INT);

I'd like to have all tuples (A, B) with a certain condition such that among selected tuples all have different A column: 我想让所有元组(A,B)都具有一定的条件,以便所选元组中的所有元组都具有不同的A列:

SELECT DISTINCT ON (A.id) A.id, B.id WHERE condition(A,B);

However DISTINCT ON will perform sorting in memory after all the tuples have been selected and I will like to not select tuples with duplicate A.id at all. 但是, DISTINCT ON将在选择所有元组之后在内存中执行排序,而我想完全不选择具有重复A.id的元组。

How can this be done in an efficient way? 如何有效地做到这一点?

EDIT: 编辑:

both A and B have unique ids A和B都有唯一的ID

EDIT2: EDIT2:

Here is the complete setup: 这是完整的设置:

CREATE EXTENSION postgis;
DROP TABLE A;
DROP TABLE B;
CREATE TABLE A(shape Geometry, id INT);
CREATE TABLE B(shape Geometry, id INT, kind INT);
CREATE INDEX ON A USING GIST (shape);`

I would like to do the following: 我要执行以下操作:

SELECT A.id, B.id FROM A, B
WHERE B.id = (SELECT B.id FROM B WHERE
     ST_Intersects(A.shape, B.shape)
     AND ST_Length(ST_Intersection(A.shape, B.shape)) / ST_Length(A.shape) >= 0.5 AND B.kind != 1 LIMIT 1)`

which works (I believe), however is not necessarily the most efficient way. (我认为)有效,但不一定是最有效的方法。 The table A has orders of magnitude more rows than table B. So I am not even sure if the GiST index is right. A比表B多了几个数量级。因此,我什至不确定GiST索引是否正确。

I am also aware that the order of arguments in ST_Intersects can have a significant effect on run time. 我也知道ST_Intersects中参数的顺序可能会对运行时产生重大影响。 What should the correct order be? 正确的顺序应该是什么?

If you want just one row for each "A", you can use a correlated subquery (or lateral join): 如果每个“ A”只需要一行,则可以使用相关的子查询(或横向联接):

select a.id,
       (select b.id
        from b
        where condition(a, b)
        limit 1
       ) as b_id
from a;

This should stop testing for rows from b when the first one is found -- which I imagine is the best approach performance-wise. 当找到第一个行时,这应该停止测试b中的行-我认为这是性能最佳的方法。

If none are found, you will get a NULL value. 如果找不到任何内容,则将获得NULL值。 You can wrap this in a subquery and filter out NULL s. 您可以将其包装在子查询中并过滤掉NULL

Try something like: 尝试类似:

WITH distinct_a as (
SELECT DISTINCT a.id 
FROM A)
SELECT A.id, B.id 
FROM distinct_a, B
WHERE condition(A,B)

The CTE ( WITH ... ) will select all distinct values first. CTE( WITH ... )将首先选择所有不同的值。 Then selected values will be used in the next query. 然后,所选的值将用于下一个查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM