简体   繁体   中英

Memory Error when calculating Euclidean distance between points

I want to find the eps of DBSCAN. I have a set of points and need to calculate the distance from each point to each other point. Where an array of shape is (2267436, 2), then find the near and minpoint. Here are my data:

xy= [[  177963.16728699  2506663.75713195]
 [  176147.50406716  2502422.34894945]
 [  178480.33178874  2507299.83467826]
 ..., 
 [  231205.88139267  2684014.30324774]
 [  231207.81085397  2684014.52219471]
 [  231214.870296    2684054.8263628 ]]

I am trying these methods like:

dist = scipy.spatial.distance.cdist(xy, xy,'euclidean')

or

np.sqrt((np.square(npxy[:,np.newaxis]-npxy).sum(axis=2)))

or

dist=scipy.spatial.distance.pdist(npxy)
d_matrix = scipy.spatial.distance.squareform(dist)

I am getting MemoryError for all. Is there any solution to figure out it?

With some very easy math you can figure out that you cannot store all O(n²) distance in memory.

If you compute only the distances of one point at a time, you will be fine.

Also, try to use an index to reduce the runtime from O(n²) to a manageable scale.

Or you use a more modern algorithm like OPTICS.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM