简体   繁体   English

数据库/SQL:如何存储经度/纬度数据?

[英]Database/SQL: How to store longitude/latitude data?

Performance question ...性能问题...

I have a database of houses that have geolocation data (longitude & latitude).我有一个包含地理位置数据(经度和纬度)的房屋数据库。

What I want to do is find the best way to store the locational data in my MySQL (v5.0.24a) using InnoDB database-engine so that I can perform a lot of queries where I'm returning all the home records that are between x1 and x2 latitude and y1 and y2 longitude .我想要做的是找到使用 InnoDB 数据库引擎将位置数据存储在我的 MySQL (v5.0.24a) 中的最佳方法,这样我就可以执行大量查询,返回介于两者之间的所有家庭记录x1 和 x2 latitude以及 y1 和 y2 longitude

Right now, my database schema is现在,我的数据库架构是

---------------------
Homes   
---------------------
geolat - Float (10,6)
geolng - Float (10,6)
---------------------

And my query is:我的查询是:

SELECT ... 
WHERE geolat BETWEEN x1 AND x2
AND geolng BETWEEN y1 AND y2
  • Is what I described above the best way to store the latitude and longitude data in MySQL using Float (10,6) and separating out the longitude/latitude?我上面描述的使用 Float (10,6) 在 MySQL 中存储纬度和经度数据并分离出经度/纬度的最佳方法是什么? If not, what is?如果不是,那是什么? There exist Float, Decimal and even Spatial as a data type.存在 Float、Decimal 甚至 Spatial 作为数据类型。
  • Is this the best way to perform the SQL from a performance standpoint?从性能的角度来看,这是执行 SQL 的最佳方式吗? If not, what is?如果不是,那是什么?
  • Does using a different MySQL database-engine make sense?使用不同的 MySQL 数据库引擎有意义吗?

UPDATE: Still Unanswered更新:仍未答复

I have 3 different answers below.我在下面有3个不同的答案。 One person say to use Float .一个人说要使用Float One person says to use INT .一个人说要使用INT One person says to use Spatial .一个人说要使用Spatial

So I used MySQL "EXPLAIN" statement to measure the SQL execution speed.所以我使用 MySQL 的“EXPLAIN”语句来衡量 SQL 的执行速度。 It appears that absolutely no difference in SQL execution (result set fetching) exist if using INT or FLOAT for the longitude and latitude data type..如果将INTFLOAT用于经度和纬度数据类型,则似乎在 SQL 执行(结果集获取)方面绝对没有区别。

It also appears that using the " BETWEEN " statement is SIGNIFICANTLY faster than using the " > " or " < " SQL statements.使用“ BETWEEN ”语句似乎也比使用“ > ”或“ < ”SQL 语句快得多。 It's nearly 3x faster to use " BETWEEN " than to use the " > " and " < " statement.使用“ BETWEEN ”比使用“ > ”和“ < ”语句快近 3 倍。

With that being said, I still am unceratin on what the performance impact would be if using Spatial since it's unclear to me if it's supported with my version of MySQL running (v5.0.24) ... as well as how I enable it if supported.话虽如此,我仍然不确定如果使用 Spatial 会对性能产生什么影响,因为我不清楚我运行的 MySQL 版本(v5.0.24)是否支持它......以及如果支持我如何启用它.

Any help would be greatly appreacited任何帮助将不胜感激

float(10,6) is just fine. float(10,6) 很好。

Any other convoluted storage schemes will require more translation in and out, and floating-point math is plenty fast.任何其他复杂的存储方案都需要更多的进出转换,并且浮点数学运算速度非常快。

I know you're asking about MySQL, but if spatial data is important to your business, you might want to reconsider.我知道您问的是 MySQL,但如果空间数据对您的业务很重要,您可能需要重新考虑。 PostgreSQL + PostGIS are also free software, and they have a great reputation for managing spatial and geographic data efficiently. PostgreSQL + PostGIS也是免费软件,它们在有效管理空间和地理数据方面享有盛誉。 Many people use PostgreSQL only because of PostGIS.许多人使用 PostgreSQL 只是因为 PostGIS。

I don't know much about the MySQL spatial system though, so perhaps it works well enough for your use-case.不过,我对 MySQL 空间系统知之甚少,所以也许它对您的用例来说已经足够好了。

The problem with using any other data type than "spatial" here is that your kind of "rectangular selection" can (usually, this depends on how bright your DBMS is - and MySQL certainly isn't generally the brightest) only be optimised in one single dimension.在这里使用除“空间”之外的任何其他数据类型的问题在于,您的“矩形选择”类型(通常,这取决于您的 DBMS 的亮度 - 而 MySQL 通常不是最亮的)只能在一个中优化单一维度。

The system can pick either the longitude index or the latitude index, and use that to reduce the set of rows to inspect.系统可以选择经度索引或纬度索引,并使用它来减少要检查的行集。 But after it has done that, there is a choice of : (a) fetching all found rows and scanning over those and test for the "other dimension", or (b) doing the similar process on the "other dimension" and then afterwards matching those two result sets to see which rows appear in both.但是在它完成之后,可以选择:(a)获取所有找到的行并扫描这些行并测试“其他维度”,或者(b)在“其他维度”上执行类似的过程,然后匹配这两个结果集以查看哪些行出现在两者中。 This latter option may not be implemented as such in your particular DBMS engine.后一个选项可能不会在您的特定 DBMS 引擎中实现。

Spatial indexes sort of do the latter "automatically", so I think it's safe to say that a spatial index will give the best performance in any case, but it may also be the case that it doesn't significantly outperform the other solutions, and that it's just not worth the bother.空间索引有点“自动”执行后者,所以我认为可以肯定地说空间索引在任何情况下都会提供最佳性能,但也可能是它并没有明显优于其他解决方案,并且这是不值得的麻烦。 This depends on all sorts of things like the volume of and the distribution in your actual data etc. etc.这取决于各种事情,例如实际数据的数量和分布等。

It is certainly true that float (tree) indexes are by necessity slower than integer indexes, because of the longer time it usually takes to execute '>' on floats than it does on integers.确实,浮点(树)索引必然比整数索引慢,因为在浮点上执行“>”通常比在整数上执行的时间更长。 But I would be surprised if this effect were actually noticeable.但如果这种效果真的很明显,我会感到惊讶。

Google uses float(10,6) in their "Store locator" example. Google 在他们的“商店定位器”示例中使用了 float(10,6)。 That's enough for me to go with that.这对我来说已经足够了。

https://stackoverflow.com/a/5994082/1094271 https://stackoverflow.com/a/5994082/1094271

Also, starting MySQL 5.6.x, spatial extensions support is much better and comparable to PostGIS in features and performance.此外,从 MySQL 5.6.x 开始,空间扩展支持要好得多,在功能和性能上与 PostGIS 相当。

I would store it as integers ( int , 4-bytes) represented in 1/1,000,000th degrees.我会将其存储为整数( int ,4 字节),以 1/1,000,000 度表示。 That would give you a resolution of few inches.那会给你几英寸的分辨率。

I don't think there is any intrinsic spatial datatype in MySQL.我认为 MySQL 中没有任何内在的空间数据类型。

Float (10,6)浮动 (10,6)

Where is latitude or longitude 5555.123456?纬度或经度 5555.123456 在哪里?

Don't you mean Float(9,6) instead?你不是说 Float(9,6) 吗?

I have the exact same schema (float(10,6)) and query (selecting inside a rectangle) and I found that switching the db engine from innoDB to myisam doubled the speed for a "point in rectangle look-up" in a table with 780,000 records.我有完全相同的架构(float(10,6))和查询(在矩形内选择),我发现将数据库引擎从 innoDB 切换到 myisam 使表格中“矩形查找点”的速度加倍有 780,000 条记录。

Additionally, I converted all lng/lat values to cartesian integers (x,y) and created a two-column index on the x,y and my speed went from ~27 ms to 1.3 ms for the same look-up.此外,我将所有 lng/lat 值转换为笛卡尔整数 (x,y),并在 x,y 上创建了一个两列索引,对于相同的查找,我的速度从 ~27 ms 变为 1.3 ms。

It really depends on how you are using the data.这实际上取决于您如何使用数据。 But in a gross over-simplification of the facts, decimal is faster but less accurate in aproximations.但是在事实的严重过度简化中,十进制更快但在近似值上不太准确。 More info here:更多信息在这里:

http://msdn.microsoft.com/en-us/library/aa223970(SQL.80).aspx http://msdn.microsoft.com/en-us/library/aa223970(SQL.80).aspx

Also, The standard for GPS coordinates is specified in ISO 6709:此外,ISO 6709 中指定了 GPS 坐标的标准:

http://en.wikipedia.org/wiki/ISO_6709 http://en.wikipedia.org/wiki/ISO_6709

I know probably you would have moved past this problem.我知道你可能已经解决了这个问题。 I just wanted to add another approach to this question, in case someone is looking to store geolocation data.我只是想为这个问题添加另一种方法,以防有人想要存储地理位置数据。 You could encode latitude and longitude information into a geohash.您可以将纬度和经度信息编码为 geohash。 Since they are prefixed searchable to a required degree of precision.因为它们的前缀可搜索到所需的精度。 It seems you can convert your query to a start and end prefix and do a prefix search with LIKE query.看来您可以将查询转换为开始和结束前缀,并使用LIKE查询进行前缀搜索。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM