I have a set of points . Their geometry (SRID: 4326) is stored in a Database. I have been given a code that aims to cluster this points with DBSCAN. The parameters have been set as follow: eps=1000, min_points=1.
I obtain clusters that are less distant than 1000 meters. I believed that two points less distant than 1000 meters would belong to the same cluster. Is epsilon really in meters?
The code is the following:
self.algorithm='DBSCAN'
X=self.data[:,[2,3]]
if self.debug==True:
print 'Nbr of Points: %d'% len(X)
# print X.shape
# print dist_matrix.shape
D = distance.squareform(distance.pdist(X,'euclidean'))
# print dist_matrix
# S = 1 - (D / np.max(D))
db = DBSCAN(eps, min_samples).fit(D)
self.core_samples = db.core_sample_indices_
self.labels = db.labels
the aim is not to find another way to run it but really to understand the value of eps. What it represents in term of distance. Min_sample is set to one because I accept to have clusters with a size of 1 sample indeed.
This depends on your implementation .
Your distance function could return anything; including meters, millimeters, yards, km, miles, degrees... but you did not share what function you use for computing distance! If I'm not mistaken, SRID: 4326
does not imply anything on distance computations.
The "haversine"
used by sklearn seems to use degrees , not meters.
Either way, min_points=1
is nonsensical. The query point is included, so every point itself is a cluster. With min_points <= 2
, the result of DBSCAN will be a single-linkage clustering. To get a density based clustering, you need to choose a higher value to get real density.
You may want to use ELKI 's DBSCAN. According to their Java sources, their distance function uses meters, but also their R*-tree index allows accelerated range queries with this distance, which will yield a substantial speed-up ( O(n log n)
instead of O(n^2)
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.