sklearn.cluster.DBSCAN提供意外结果

Question

I'm using DBSCAN method for clustering images, but it gives unexpected result. 我正在使用DBSCAN方法对图像进行聚类，但是会产生意外的结果。 Let's assume I have 10 images. 假设我有10张图片。

Firstly, I read an images in a loop using cv2.imread . 首先，我使用cv2.imread循环读取图像。 Then I compute structural similarity index between each images. 然后，我计算每个图像之间的结构相似性指数。 After that, I have a matrix like this: 在那之后，我有一个像这样的矩阵：

[
[ 1.         -0.00893619  0.          0.          0.          0.50148778      0.47921832  0.          0.          0.        ]

[-0.00893619  1.          0.          0.          0.          0.00996088     -0.01873205  0.          0.          0.        ]

[ 0.          0.          1.          0.57884212  0.          0.             0.         0.          0.          0.        ]

[ 0.          0.          0.57884212  1.          0.          0.              0.         0.          0.          0.        ]

[ 0.  0.  0.  0.  1.  0.  0.  0.  0.  0.]

[ 0.50148778  0.00996088  0.          0.          0.          1.          0.63224396  0.          0.          0.        ]

[ 0.47921832 -0.01873205  0.          0.          0.          0.63224396      1.          0.          0.          0.        ]

[ 0.          0.          0.          0.          0.          0.          0.  1.          0.77507487  0.69697053]

[ 0.          0.          0.          0.          0.          0.          0.  0.77507487  1.          0.74861881]

[ 0.          0.          0.          0.          0.          0.          0.  0.69697053  0.74861881  1.        ]]

Looks good. 看起来不错。 Then I have simple invokation of DBSCAN: 然后，我将简单地调用DBSCAN：

db = DBSCAN(eps=0.4, min_samples=3, metric='precomputed').fit(distances)
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

And the result is 结果是

[0 0 0 0 0 0 0 0 0 0]

What do I do wrong? 我做错了什么？ Why it puts all images into one cluster? 为什么将所有图像都放在一个群集中？

Answer 1

DBSCAN usually assumes a dissimilarity (distance) not a similarity. DBSCAN通常假定不相似 （距离）而不是相似性。 It can be implemented with a similarity threshold, too (see Generalized DBSCAN) 也可以使用相似性阈值来实现（请参见通用DBSCAN）

Answer 2

问题是我错误地计算了距离矩阵-主对角线上的条目全为零。

sklearn.cluster.DBSCAN提供意外结果

问题描述

2 个解决方案

解决方案1
1 2016-08-29 15:16:01

解决方案2
0 2016-08-29 14:02:26

sklearn.cluster.DBSCAN提供意外结果

问题描述

2 个解决方案

解决方案1 1 2016-08-29 15:16:01

解决方案2 0 2016-08-29 14:02:26

解决方案1
1 2016-08-29 15:16:01

解决方案2
0 2016-08-29 14:02:26