简体   繁体   中英

Unstable query behavior in windowing function in PostgreSQL 9.6

I downloaded OSM data for my country from geofabrik.de, successfully imported it to PostgreSQL 9.6 installed on Ubuntu 16.04 and used it for several times. I also created web application, which works correctly. So I decided to add another functionality which returns top nearest special points (eg restaurants) from some points. For one nearest point it works, but when I want return array of them, it doesn't work. So I decomposed my problem and found strange behavior. When I executed following query:

SELECT t.osm_id
      FROM (
        SELECT DISTINCT ON (a.points) a.points, v.osm_id AS osm_id, MIN(ST_DISTANCE(v.the_geom, a.points)) OVER (PARTITION BY a.points ORDER BY ST_DISTANCE(v.the_geom, a.points))
        FROM (SELECT ST_GEOMFROMEWKT('SRID=4326;POINT(17.104854583740238 48.15099866770469)') AS points) a
        CROSS JOIN ways_vertices_pgr v
      ) AS t

it returns:

| osm_id            |
| ----------------- |
| 2338524511        |

When I displayed this point on map, it is placed far away from original point and after I changed point in subquery, the result remains same. Also I know there are many points between displayed and original point, which should be returned by query. Then I tried run following query:

SELECT t.*, t.osm_id
      FROM (
        SELECT DISTINCT ON (a.points) a.points, v.osm_id AS osm_id, MIN(ST_DISTANCE(v.the_geom, a.points)) OVER (PARTITION BY a.points ORDER BY ST_DISTANCE(v.the_geom, a.points))
        FROM (SELECT ST_GEOMFROMEWKT('SRID=4326;POINT(17.104854583740238 48.15099866770469)') AS points) a
        CROSS JOIN ways_vertices_pgr v
      ) AS t

and it returns:

| points                                             | osm_id   | min                  | osm_id     |
| -------------------------------------------------- | -------- | -------------------- | --------   |
| 0101000020E6100000010000C0D71A3140FFC3A1EC53134840 | 33169309 | 0.000124886435658481 | 33169309   |

Whole query except SELECT part remains same, but result is different and now it is correct. Can anyone suggest me how to change query to works properly?

When you use distinct on , you need an order by . I think this is the logic you want for the first query:

    SELECT DISTINCT ON (a.points) a.points, v.osm_id AS osm_id,ST_DISTANCE(v.the_geom, a.points) as dist
    FROM (SELECT ST_GEOMFROMEWKT('SRID=4326;POINT(17.104854583740238 48.15099866770469)') AS points) a CROSS JOIN
         ways_vertices_pgr v
    ORDER BY a.points, dist;

Check the output of EXPLAIN ANALYZE with your query to see exactly why the results are changing when you add the columns. Likely it's using a slightly different execution plan which affects the ordering of rows.

DISTINCT ON is by definition non-deterministic, meaning the results can change between executions. From the PostgreSQL 9.6 manual :

SELECT DISTINCT ON ... Note that the "first row" of a set is unpredictable unless the query is sorted on enough columns to guarantee a unique ordering of the rows arriving at the DISTINCT filter. (DISTINCT ON processing occurs after ORDER BY sorting.)

Adding an ORDER BY as Gordon suggested should give you repeatable results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM