简体   繁体   English

优化mysql查询以使用空间索引选择多边形中的所有点

[英]Optimizing mysql query to select all points with in polygon using spatial indexes

Firstly, I admit that my experience with spatial functions is very minimal. 首先,我承认我的空间功能经验非常少。 I have a table in MySQL with 20 fields and 23549187 records that contain geographical data. 我在MySQL中有一个包含20个字段的表和23549187个包含地理数据的记录。 One of the fields is 'point' which is of point data type and has spatial index on it. 其中一个字段是“点”,它是点数据类型并且在其上具有空间索引。 I have a query that selects all points within a polygon which looks like this, 我有一个查询,选择多边形内的所有点,如下所示,

select * from `table_name` where ST_CONTAINS(ST_GEOMFROMTEXT('POLYGON((151.186 -23.497,151.207 -23.505,151.178 -23.496,151.174 -23.49800000000001,151.176 -23.496,151.179 -23.49500000000002,151.186 -23.497))'), `point`)

This works well as the polygon is small. 这很好用,因为多边形很小。 However, if the polygon gets massive, the execution times gets really slow and the slowest query until now ran for 15 mins. 但是,如果多边形变得庞大,执行时间变得非常慢,最慢的查询到现在为止已经运行了15分钟。 Adding the index had really helped to bring it down to 15 mins which otherwise would have taken close to an hour. 添加指数确实有助于将其降低到15分钟,否则将花费近一个小时。 Is there anything I can do here for further improvement. 我有什么可以做的,以进一步改进。 This query will be run by a PHP script that runs as a daemon and I am worried if this slow queries will bring the MySQL server down. 此查询将由作为守护程序运行的PHP脚本运行,我担心这种慢速查询会导致MySQL服务器崩溃。

All suggestions to make it better are welcome. 我们欢迎所有建议让它变得更好。 Thanks. 谢谢。

EDIT: 编辑:

show create table;

CREATE TABLE `table_name` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `lat` float(12,6) DEFAULT NULL,
  `long` float(12,6) DEFAULT NULL,
  `point` point NOT NULL,
  PRIMARY KEY (`id`),
  KEY `lat` (`lat`,`long`),
  SPATIAL KEY `sp_index` (`point`)
) ENGINE=MyISAM AUTO_INCREMENT=47222773 DEFAULT CHARSET=utf8mb4

There are few more fields that I am not supposed to disclose it here however the filter won 我不应该在这里披露更多的字段,但过滤器赢了

Explain sql output for the slow query: 解释慢查询的sql输出:

+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | table_name | ALL  | NULL          | NULL | NULL    | NULL | 23549187 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+----------+-------------+

Explain sql output for query with smaller polygons, 用较小的多边形解释sql输出以进行查询,

+----+-------------+------------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table      | type  | possible_keys | key      | key_len | ref  | rows | Extra       |
+----+-------------+------------+-------+---------------+----------+---------+------+------+-------------+
|  1 | SIMPLE      | table_name | range | sp_index      | sp_index | 34      | NULL |    1 | Using where |
+----+-------------+------------+-------+---------------+----------+---------+------+------+-------------+

Looks like the biggest polygon does not use the index. 看起来最大的多边形不使用索引。

MySQL uses R-Trees for indexing spatial data. MySQL使用R-Trees来索引空间数据。 Like B-Tree indexes , these are optimal for queries targeting a small subset of the total number. B-Tree索引一样 ,这些索引最适合针对总数的一小部分的查询。 As your bounding polygon gets larger the number of possible matches increases and, at some point, the optimizer decides it is more efficient to switch to a full table scan. 随着边界多边形变大,可能匹配的数量增加,并且在某些时候,优化器决定切换到全表扫描更有效。 That appears to be the scenario here, and I see three options: 这似乎是这里的情景,我看到三个选项:

First, try adding a LIMIT to your query. 首先,尝试在查询中添加LIMIT Normally, MySQL ignores the index if the optimizer concludes fewer I/O seeks would occur in a full table scan. 通常,如果优化器在完整表扫描中得出的I / O搜索次数较少,则MySQL会忽略该索引。 But, with B-Tree indexes at least, MySQL will short-circuit that logic and always perform the B-Tree dive when LIMIT is present. 但是,至少使用B-Tree索引,MySQL将使该逻辑短路并且在LIMIT存在时始终执行B-Tree潜水。 I hypothesize R-Tree have a similar short-circuiting. 我假设R-Tree有类似的短路。

Second, and similar in spirit to the first, try forcing MySQL to use the index . 其次,与第一个类似,尝试强制MySQL使用索引 This instructs MySQL that the table scan is more expensive than the optimizer decides. 这指示MySQL表扫描比优化器决定的更昂贵。 Understand that the optimizer only has heuristics and doesn't really know how "expensive" things are beyond what its internal statistics conclude. 理解优化器只有启发式,并且不知道“昂贵”的东西是如何超出其内部统计数据的结论。 We humans have intuition, which sometimes - sometimes - knows better. 我们人类有直觉,有时 - 有时 - 知道更好。

select * force index (`sp_index`) from `table_name` where ST_CONTAINS(ST_GEOMFROMTEXT('POLYGON((151.186 -23.497,151.207 -23.505,151.178 -23.496,151.174 -23.49800000000001,151.176 -23.496,151.179 -23.49500000000002,151.186 -23.497))'), `point`)

Finally, if those don't work, then what you need to do is break up your bounding polygon into smaller polygons. 最后,如果那些不起作用,那么你需要做的是将边界多边形分解成更小的多边形。 For example, if your bounding polygon is a square 500km per side, break it up into 4 squares 250km on each side, or 16 squares 125km per side, etc. Then UNION all of these together. 例如,如果您的边界多边形是每边500公里的正方形,则将其分成每边250平方公里的4个方格,或每边125平方公里,等等。然后UNION所有这些组合在一起。 The index will be used on each one, and the cumulative result may be faster. 索引将用于每个索引,累积结果可能更快。 (Note it's important to UNION them together: MySQL cannot apply multiple range scans on a spatial query.) (注意:这是重要的UNION在一起:MySQL不能在空间查询应用多个范围扫描)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM