[英]Unstable query behavior in windowing function in PostgreSQL 9.6
I downloaded OSM data for my country from geofabrik.de, successfully imported it to PostgreSQL 9.6 installed on Ubuntu 16.04 and used it for several times. 我从geofabrik.de下载了我所在国家/地区的OSM数据,并将其成功导入到Ubuntu 16.04上安装的PostgreSQL 9.6,并多次使用。 I also created web application, which works correctly. 我还创建了可正常运行的Web应用程序。 So I decided to add another functionality which returns top nearest special points (eg restaurants) from some points. 因此,我决定添加另一个功能,该功能可以从某些点返回最接近的特殊点(例如餐馆)。 For one nearest point it works, but when I want return array of them, it doesn't work. 对于最近的一点,它可以工作,但是当我想要返回它们的数组时,它不起作用。 So I decomposed my problem and found strange behavior. 所以我分解了问题,发现了奇怪的行为。 When I executed following query: 当我执行以下查询时:
SELECT t.osm_id
FROM (
SELECT DISTINCT ON (a.points) a.points, v.osm_id AS osm_id, MIN(ST_DISTANCE(v.the_geom, a.points)) OVER (PARTITION BY a.points ORDER BY ST_DISTANCE(v.the_geom, a.points))
FROM (SELECT ST_GEOMFROMEWKT('SRID=4326;POINT(17.104854583740238 48.15099866770469)') AS points) a
CROSS JOIN ways_vertices_pgr v
) AS t
it returns: 它返回:
| osm_id |
| ----------------- |
| 2338524511 |
When I displayed this point on map, it is placed far away from original point and after I changed point in subquery, the result remains same. 当我在地图上显示此点时,它的位置离原始点很远,并且在子查询中更改了点后,结果仍然相同。 Also I know there are many points between displayed and original point, which should be returned by query. 我也知道在显示点和原始点之间有很多点,应该通过查询返回。 Then I tried run following query: 然后我尝试运行以下查询:
SELECT t.*, t.osm_id
FROM (
SELECT DISTINCT ON (a.points) a.points, v.osm_id AS osm_id, MIN(ST_DISTANCE(v.the_geom, a.points)) OVER (PARTITION BY a.points ORDER BY ST_DISTANCE(v.the_geom, a.points))
FROM (SELECT ST_GEOMFROMEWKT('SRID=4326;POINT(17.104854583740238 48.15099866770469)') AS points) a
CROSS JOIN ways_vertices_pgr v
) AS t
and it returns: 它返回:
| points | osm_id | min | osm_id |
| -------------------------------------------------- | -------- | -------------------- | -------- |
| 0101000020E6100000010000C0D71A3140FFC3A1EC53134840 | 33169309 | 0.000124886435658481 | 33169309 |
Whole query except SELECT part remains same, but result is different and now it is correct. 除了SELECT部分之外的整个查询保持不变,但是结果不同,现在是正确的。 Can anyone suggest me how to change query to works properly? 谁能建议我如何更改查询以使其正常工作?
When you use distinct on
, you need an order by
. 当您使用distinct on
,需要按order by
。 I think this is the logic you want for the first query: 我认为这是您想要的第一个查询逻辑:
SELECT DISTINCT ON (a.points) a.points, v.osm_id AS osm_id,ST_DISTANCE(v.the_geom, a.points) as dist
FROM (SELECT ST_GEOMFROMEWKT('SRID=4326;POINT(17.104854583740238 48.15099866770469)') AS points) a CROSS JOIN
ways_vertices_pgr v
ORDER BY a.points, dist;
Check the output of EXPLAIN ANALYZE
with your query to see exactly why the results are changing when you add the columns. 通过查询检查EXPLAIN ANALYZE
的输出,以确切了解添加列时结果为何更改。 Likely it's using a slightly different execution plan which affects the ordering of rows. 可能它使用的执行计划略有不同,这会影响行的顺序。
DISTINCT ON
is by definition non-deterministic, meaning the results can change between executions. DISTINCT ON
根据定义是不确定的,这意味着结果可以在两次执行之间改变。 From the PostgreSQL 9.6 manual : 从PostgreSQL 9.6手册 :
SELECT DISTINCT ON
... Note that the "first row" of a set is unpredictable unless the query is sorted on enough columns to guarantee a unique ordering of the rows arriving at the DISTINCT filter.SELECT DISTINCT ON
...请注意,除非查询在足够的列上排序以保证到达DISTINCT过滤器的行的唯一顺序,否则集合的“第一行”是不可预测的。 (DISTINCT ON processing occurs after ORDER BY sorting.) (在ORDER BY排序之后进行DISTINCT ON处理。)
Adding an ORDER BY
as Gordon suggested should give you repeatable results. 按照戈登的建议添加ORDER BY
应该会给您带来可重复的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.