简体   繁体   English

DBSCAN的经纬度坐标可能不精确

[英]DBSCAN with potentially imprecise lat/long coordinates

I've been running sci-kit learn's DBSCAN implementation to cluster a set of geotagged photos by lat/long. 我一直在运行sci-kit Learn的DBSCAN实现,以按纬度/经度对一组带有地理标记的照片进行聚类。 For the most part, it works pretty well, but I came across a few instances that were puzzling. 在大多数情况下,它运行良好,但是我遇到了一些令人费解的实例。 For instance, there were two sets of photos for which the user-entered text field specified that the photo was taken at Central Park, but the lat/longs for those photos were not clustered together. 例如,有两组照片,用户输入的文本字段指定了这些照片是在中央公园拍摄的,但是这些照片的经/纬度并未聚集在一起。 The photos themselves confirmed that they both sets of observations were from Central Park, but the lat/longs were in fact further apart than epsilon . 这些照片本身证实了这两组观测值均来自中央公园,但经纬度实际上比epsilon更远。

After a little investigation, I discovered that the reason for this was because the lat/long geotags (which were generated from the phone's GPS) are pretty imprecise. 经过一番调查,我发现原因是因为纬度/经度地理标记(由手机的GPS生成)非常不精确。 When I looked at the location accuracy of each photo, I discovered that they ranged widely (I've seen a margin of error of up to 600 meters) and that when you take the location accuracy into account, these two sets of photos are within a nearby distance in terms of lat/long. 当我查看每张照片的位置精度时,我发现它们的范围很广(我看到误差范围高达600米),并且考虑到位置精度,这两套照片都在纬度/经度附近的距离。

Is there any way to account for margin of error in lat/long when you're doing DBSCAN? 在执行DBSCAN时,有什么方法可以解决经纬度误差?

( Note : I'm not sure if this question is as articulate as it should be, so if there's anything I can do to make it more clear, please let me know.) 注意 :我不确定这个问题是否应该表达清楚,因此,如果有什么我可以做的更清楚一点,请告诉我。)

Note that DBSCAN doesn't actually need the distances. 注意, DBSCAN实际上并不需要距离。

Look up Generalized DBSCAN: all it really uses is a "is a neighbor of" relationship. 查找通用DBSCAN:它真正使用的只是一个“是邻居”关系。

If you really need to incorporate uncertainty, look up the various DBSCAN variations and extensions that handle imprecise data explicitely. 如果您真的需要考虑不确定性,请查找各种DBSCAN变体和扩展,它们可以明确地处理不精确的数据。 However, you may get pretty much the same results just by choosing a threshold for epsilon that is somewhat reasonable. 但是,仅通过选择合理的epsilon阈值,您可能会得到几乎相同的结果。 There is room for choosing a larger epsilon that the one you deem adequate: if you want to use epsilon = 1km, and you assume your data is imprecise on the range of 100m, then use 1100m as epsilon instead. 有足够的空间来选择您认为足够的更大的epsilon:如果要使用epsilon = 1km,并且假设数据在100m范围内不精确,则可以使用1100m作为epsilon。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM