简体   繁体   English

PostgreSQL 9.6中窗口功能中的不稳定查询行为

[英]Unstable query behavior in windowing function in PostgreSQL 9.6

I downloaded OSM data for my country from geofabrik.de, successfully imported it to PostgreSQL 9.6 installed on Ubuntu 16.04 and used it for several times. 我从geofabrik.de下载了我所在国家/地区的OSM数据,并将其成功导入到Ubuntu 16.04上安装的PostgreSQL 9.6,并多次使用。 I also created web application, which works correctly. 我还创建了可正常运行的Web应用程序。 So I decided to add another functionality which returns top nearest special points (eg restaurants) from some points. 因此,我决定添加另一个功能,该功能可以从某些点返回最接近的特殊点(例如餐馆)。 For one nearest point it works, but when I want return array of them, it doesn't work. 对于最近的一点,它可以工作,但是当我想要返回它们的数组时,它不起作用。 So I decomposed my problem and found strange behavior. 所以我分解了问题,发现了奇怪的行为。 When I executed following query: 当我执行以下查询时:

SELECT t.osm_id
      FROM (
        SELECT DISTINCT ON (a.points) a.points, v.osm_id AS osm_id, MIN(ST_DISTANCE(v.the_geom, a.points)) OVER (PARTITION BY a.points ORDER BY ST_DISTANCE(v.the_geom, a.points))
        FROM (SELECT ST_GEOMFROMEWKT('SRID=4326;POINT(17.104854583740238 48.15099866770469)') AS points) a
        CROSS JOIN ways_vertices_pgr v
      ) AS t

it returns: 它返回:

| osm_id            |
| ----------------- |
| 2338524511        |

When I displayed this point on map, it is placed far away from original point and after I changed point in subquery, the result remains same. 当我在地图上显示此点时,它的位置离原始点很远,并且在子查询中更改了点后,结果仍然相同。 Also I know there are many points between displayed and original point, which should be returned by query. 我也知道在显示点和原始点之间有很多点,应该通过查询返回。 Then I tried run following query: 然后我尝试运行以下查询:

SELECT t.*, t.osm_id
      FROM (
        SELECT DISTINCT ON (a.points) a.points, v.osm_id AS osm_id, MIN(ST_DISTANCE(v.the_geom, a.points)) OVER (PARTITION BY a.points ORDER BY ST_DISTANCE(v.the_geom, a.points))
        FROM (SELECT ST_GEOMFROMEWKT('SRID=4326;POINT(17.104854583740238 48.15099866770469)') AS points) a
        CROSS JOIN ways_vertices_pgr v
      ) AS t

and it returns: 它返回:

| points                                             | osm_id   | min                  | osm_id     |
| -------------------------------------------------- | -------- | -------------------- | --------   |
| 0101000020E6100000010000C0D71A3140FFC3A1EC53134840 | 33169309 | 0.000124886435658481 | 33169309   |

Whole query except SELECT part remains same, but result is different and now it is correct. 除了SELECT部分​​之外的整个查询保持不变,但是结果不同,现在是正确的。 Can anyone suggest me how to change query to works properly? 谁能建议我如何更改查询以使其正常工作?

When you use distinct on , you need an order by . 当您使用distinct on ,需要按order by I think this is the logic you want for the first query: 我认为这是您想要的第一个查询逻辑:

    SELECT DISTINCT ON (a.points) a.points, v.osm_id AS osm_id,ST_DISTANCE(v.the_geom, a.points) as dist
    FROM (SELECT ST_GEOMFROMEWKT('SRID=4326;POINT(17.104854583740238 48.15099866770469)') AS points) a CROSS JOIN
         ways_vertices_pgr v
    ORDER BY a.points, dist;

Check the output of EXPLAIN ANALYZE with your query to see exactly why the results are changing when you add the columns. 通过查询检查EXPLAIN ANALYZE的输出,以确切了解添加列时结果为何更改。 Likely it's using a slightly different execution plan which affects the ordering of rows. 可能它使用的执行计划略有不同,这会影响行的顺序。

DISTINCT ON is by definition non-deterministic, meaning the results can change between executions. DISTINCT ON根据定义是不确定的,这意味着结果可以在两次执行之间改变。 From the PostgreSQL 9.6 manual : PostgreSQL 9.6手册

SELECT DISTINCT ON ... Note that the "first row" of a set is unpredictable unless the query is sorted on enough columns to guarantee a unique ordering of the rows arriving at the DISTINCT filter. SELECT DISTINCT ON ...请注意,除非查询在足够的列上排序以保证到达DISTINCT过滤器的行的唯一顺序,否则集合的“第一行”是不可预测的。 (DISTINCT ON processing occurs after ORDER BY sorting.) (在ORDER BY排序之后进行DISTINCT ON处理。)

Adding an ORDER BY as Gordon suggested should give you repeatable results. 按照戈登的建议添加ORDER BY应该会给您带来可重复的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM