简体   繁体   中英

Parameter eps of DBSCAN, python

I have a set of points . Their geometry (SRID: 4326) is stored in a Database. I have been given a code that aims to cluster this points with DBSCAN. The parameters have been set as follow: eps=1000, min_points=1.

I obtain clusters that are less distant than 1000 meters. I believed that two points less distant than 1000 meters would belong to the same cluster. Is epsilon really in meters?

The code is the following:

    self.algorithm='DBSCAN'
    X=self.data[:,[2,3]]
    if self.debug==True:
        print 'Nbr of Points: %d'% len(X)
    # print X.shape
    # print dist_matrix.shape
    D = distance.squareform(distance.pdist(X,'euclidean'))
    # print dist_matrix
    # S = 1 - (D / np.max(D))
    db = DBSCAN(eps, min_samples).fit(D)
    self.core_samples = db.core_sample_indices_
    self.labels = db.labels

the aim is not to find another way to run it but really to understand the value of eps. What it represents in term of distance. Min_sample is set to one because I accept to have clusters with a size of 1 sample indeed.

This depends on your implementation .

Your distance function could return anything; including meters, millimeters, yards, km, miles, degrees... but you did not share what function you use for computing distance! If I'm not mistaken, SRID: 4326 does not imply anything on distance computations.

The "haversine" used by sklearn seems to use degrees , not meters.

Either way, min_points=1 is nonsensical. The query point is included, so every point itself is a cluster. With min_points <= 2 , the result of DBSCAN will be a single-linkage clustering. To get a density based clustering, you need to choose a higher value to get real density.

You may want to use ELKI 's DBSCAN. According to their Java sources, their distance function uses meters, but also their R*-tree index allows accelerated range queries with this distance, which will yield a substantial speed-up ( O(n log n) instead of O(n^2) ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM