简体   繁体   中英

DBSCAN eps incorrect behaviour

from sklearn.cluster import DBSCAN
import numpy as np
X=np.array([1,9,11,13,14,15,19]).reshape(-1, 1)
db=DBSCAN(eps=3, min_samples=1).fit(X)
print(db.labels_)

prints:

[0 1 1 1 1 1 2]

while doc says :

 eps : float, optional The maximum distance between two samples for them to be considered as in the same neighborhood. 

Here 9 and 15 are in the same cluster while euclidiean distance between them is 6 which is >3

What am i missing?

To point out why they are in the same cluster, let me give you a high-level explanation of what DBSCAN does.

  1. Construct a graph by connecting data points
  2. Measure size of each connected component
  3. Discard the components smaller than a threshold, which in sklearn is the min_sample parameter.

esp controls the maximum distance for which you connect two data points. For your dataset, with the notation of [ab] denoting connections, you have:

[9-11], [11-13], [13-14], [14-15]

So these points are all in the same component, and the component is larger than your min_sample parameter, so it is considered a valid cluster.

This is because the points chain together. 15 is less than eps from 14, so it is included in that cluster.

You can see the behavior here:

X=np.array([1,9,11,13,14,15,17,19]).reshape(-1, 1)
db=DBSCAN(eps=3, min_samples=1).fit(X)

print(db.labels_)

gives: [0 1 1 1 1 1 1 1]

A neighborhood is not the same thing as a cluster.

The cluster is the union of many neighborhoods . Epsilon is the maximum distance from the center of one neighborhood; but if you merge multiple neighborhoods the distances can become arbitrarily large if your data is dense .

But the description of the parameter in sklearn is also wrong, not just misleading. When the triangle inequality holds, points in the same neighborhood can be two epsilon apart (and more if you don't use a metric).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM