I have a postgres table ("dist_mx") that indicates the distances between two points (geographic space). The points are defined in the "hex_0" and "hex_1" columns. The table will eventually be 10^7 to 10^8 rows. The table is structured as such:
One of the purposes of this table is to query the shortest distance from a list of points (1000s) to the points that correspond to locations of interest. For example, I want to know the shortest distance from each point to a grocery stores (we know how each grocery store corresponds to point ids).
I'm using a UNION statement to run the query. The OR statement is used because the order of the points is arbitrary (ie, pairs aren't repeated in reverse order). See below:
SELECT MIN(distances) FROM dist_mx
WHERE ((point_id_0= '8829abb139fffff' AND point_id_1 IN ('8829abb555fffff', ...))
OR (point_id_1= '8829abb139fffff' AND point_id_0 IN ('8829abb555fffff', ...))
UNION
SELECT MIN(distances) FROM dist_mx
WHERE ((point_id_0= '8829abb469fffff' AND point_id_1 IN ('8829abb555fffff', ...))
OR (point_id_1= '8829abb469fffff' AND point_id_0 IN ('8829abb555fffff', ...))
...
The query seems to be working as intended but it is slow. It takes 20 minutes for the query to run on a list of ~4500 points. I have tried chunking the query so I only include 500 queries at a time (ie, connected by the UNION statement), but this does not significantly change performance.
I'm relatively new to postgres so I am hoping that there is a fairly simple speedup (or a not fairly simple speedup)?
Without seeing an explain analyze
for your query, and also the whole query, I can't give specific advice. There's also probably a better way to write your query, but it's unclear what you're doing.
Here's some general advice.
The basic performance tool is indexes. Without indexes, Postgres must scan the whole table, probably repeatedly. See Use The Index, Luke for more.
A multi-column index on (point_id_0, point_id_1)
will allow Postgres to quickly find the matching rows without having to scan the whole table.
create index dist_mx_points_idx on dist_mx(point_id_0, point_id_1)
That should help significantly.
One of the purposes of this table is to query the shortest distance from a list of points (1000s) to the points that correspond to locations of interest. For example, I want to know the shortest distance from each point to a grocery stores (we know how each grocery store corresponds to point ids).
Use PostGIS .
Other notes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.