Postgis / SQL选择元组，以便第一个元组项目是唯一的，并且项目几何形状相交

Question

This question is particularly for Postgres 9.4 这个问题特别针对Postgres 9.4

Lets say I have two tables: 可以说我有两个表：

CREATE TABLE A(id INT);
CREATE TABLE B(id INT);

I'd like to have all tuples (A, B) with a certain condition such that among selected tuples all have different A column: 我想让所有元组（A，B）都具有一定的条件，以便所选元组中的所有元组都具有不同的A列：

SELECT DISTINCT ON (A.id) A.id, B.id WHERE condition(A,B);

However DISTINCT ON will perform sorting in memory after all the tuples have been selected and I will like to not select tuples with duplicate A.id at all. 但是， DISTINCT ON将在选择所有元组之后在内存中执行排序，而我想完全不选择具有重复A.id的元组。

How can this be done in an efficient way? 如何有效地做到这一点？

EDIT: 编辑：

both A and B have unique ids A和B都有唯一的ID

EDIT2: EDIT2：

Here is the complete setup: 这是完整的设置：

CREATE EXTENSION postgis;
DROP TABLE A;
DROP TABLE B;
CREATE TABLE A(shape Geometry, id INT);
CREATE TABLE B(shape Geometry, id INT, kind INT);
CREATE INDEX ON A USING GIST (shape);`

I would like to do the following: 我要执行以下操作：

SELECT A.id, B.id FROM A, B
WHERE B.id = (SELECT B.id FROM B WHERE
     ST_Intersects(A.shape, B.shape)
     AND ST_Length(ST_Intersection(A.shape, B.shape)) / ST_Length(A.shape) >= 0.5 AND B.kind != 1 LIMIT 1)`

which works (I believe), however is not necessarily the most efficient way. （我认为）有效，但不一定是最有效的方法。 The table A has orders of magnitude more rows than table B. So I am not even sure if the GiST index is right. 表A比表B多了几个数量级。因此，我什至不确定GiST索引是否正确。

I am also aware that the order of arguments in ST_Intersects can have a significant effect on run time. 我也知道ST_Intersects中参数的顺序可能会对运行时产生重大影响。 What should the correct order be? 正确的顺序应该是什么？

Answer 1

If you want just one row for each "A", you can use a correlated subquery (or lateral join): 如果每个“ A”只需要一行，则可以使用相关的子查询（或横向联接）：

select a.id,
       (select b.id
        from b
        where condition(a, b)
        limit 1
       ) as b_id
from a;

This should stop testing for rows from b when the first one is found -- which I imagine is the best approach performance-wise. 当找到第一个行时，这应该停止测试b中的行-我认为这是性能最佳的方法。

If none are found, you will get a NULL value. 如果找不到任何内容，则将获得NULL值。 You can wrap this in a subquery and filter out NULL s. 您可以将其包装在子查询中并过滤掉NULL 。

Answer 2

Try something like: 尝试类似：

WITH distinct_a as (
SELECT DISTINCT a.id 
FROM A)
SELECT A.id, B.id 
FROM distinct_a, B
WHERE condition(A,B)

The CTE ( WITH ... ) will select all distinct values first. CTE（ WITH ... ）将首先选择所有不同的值。 Then selected values will be used in the next query. 然后，所选的值将用于下一个查询。

Postgis / SQL选择元组，以便第一个元组项目是唯一的，并且项目几何形状相交

问题描述

EDIT: 编辑：

EDIT2: EDIT2：

2 个解决方案

解决方案1
1 2015-03-03 11:48:52

解决方案2
0 2015-03-03 11:44:53

Postgis / SQL选择元组，以便第一个元组项目是唯一的，并且项目几何形状相交

问题描述

EDIT: 编辑：

EDIT2: EDIT2：

2 个解决方案

解决方案1 1 2015-03-03 11:48:52

解决方案2 0 2015-03-03 11:44:53

解决方案1
1 2015-03-03 11:48:52

解决方案2
0 2015-03-03 11:44:53