简体   繁体   中英

Is there any way to speed up this postgres query?

I have a postgres table ("dist_mx") that indicates the distances between two points (geographic space). The points are defined in the "hex_0" and "hex_1" columns. The table will eventually be 10^7 to 10^8 rows. The table is structured as such:

在此处输入图像描述

One of the purposes of this table is to query the shortest distance from a list of points (1000s) to the points that correspond to locations of interest. For example, I want to know the shortest distance from each point to a grocery stores (we know how each grocery store corresponds to point ids).

I'm using a UNION statement to run the query. The OR statement is used because the order of the points is arbitrary (ie, pairs aren't repeated in reverse order). See below:

SELECT MIN(distances) FROM dist_mx
WHERE ((point_id_0= '8829abb139fffff' AND point_id_1 IN ('8829abb555fffff', ...))
    OR (point_id_1= '8829abb139fffff' AND point_id_0 IN ('8829abb555fffff', ...))
UNION
SELECT MIN(distances) FROM dist_mx
WHERE ((point_id_0= '8829abb469fffff' AND point_id_1 IN ('8829abb555fffff', ...))
    OR (point_id_1= '8829abb469fffff' AND point_id_0 IN ('8829abb555fffff', ...))
...

The query seems to be working as intended but it is slow. It takes 20 minutes for the query to run on a list of ~4500 points. I have tried chunking the query so I only include 500 queries at a time (ie, connected by the UNION statement), but this does not significantly change performance.

I'm relatively new to postgres so I am hoping that there is a fairly simple speedup (or a not fairly simple speedup)?

EDIT: adding schema在此处输入图像描述

Without seeing an explain analyze for your query, and also the whole query, I can't give specific advice. There's also probably a better way to write your query, but it's unclear what you're doing.

Here's some general advice.


The basic performance tool is indexes. Without indexes, Postgres must scan the whole table, probably repeatedly. See Use The Index, Luke for more.

A multi-column index on (point_id_0, point_id_1) will allow Postgres to quickly find the matching rows without having to scan the whole table.

create index dist_mx_points_idx on dist_mx(point_id_0, point_id_1)

That should help significantly.

One of the purposes of this table is to query the shortest distance from a list of points (1000s) to the points that correspond to locations of interest. For example, I want to know the shortest distance from each point to a grocery stores (we know how each grocery store corresponds to point ids).

Use PostGIS .


Other notes.

  • Don't store hex as a string, store it as a bigint and convert. This will take less space and is faster.
  • Don't store numbers as text, use an integer.
  • Don't store your points as two columns, use a single point column. Then you can use geometric operators . However, these are 2D calculations and only accurate for GIS over short distances.
  • Since you're doing GIS, don't do this by hand. Use PostGIS .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM