使用dbscan对角度数据进行聚类

Question

I need to cluster data-points in the form of X,Y,Phi. 我需要以X，Y，Phi的形式对数据点进行聚类。 Right now I use DBSCAN (sklearn). 现在，我使用DBSCAN（sklearn）。 The clustering works except for one thing: Phi is angular data, which is modulo 2*Pi. 除一件事外，聚类工作：Phi是角度数据，其模数为2 * Pi。 As a result the clustering near Phi=0 is incorrect. 结果，Phi = 0附近的聚类是不正确的。 Is there a trick to fix this? 有解决这个问题的技巧吗？ I could not find one or make one up myself that worked. 我找不到一个人，或者自己一个人工作了。

Thank you. 谢谢。

Answer 1

Circular boundary conditions are not easy to implement practically outside a Fourier framework. 圆形边界条件实际上很难在傅立叶框架之外实现。

You could try reparametrizing by replacing X, Y, Phi with X, Y, a * cos(Phi), a * sin(Phi) , where a > 0 , a sort of scale factor, needs to be chosen correctly in order for this projection into 2D space to act the way you need in clustering. 您可以尝试通过将X, Y, Phi替换为X, Y, a * cos(Phi), a * sin(Phi)来重新参数化，为此需要正确选择a > 0 （一种比例因子）投影到2D空间中，以按照聚类所需的方式进行操作。 Start by checking a = 1 (if it worked OK for pure Phi , then this is a good candidate) and then a on the order of magnitude of your remaining data X, Y . 通过检查启动a = 1 （如果它纯工作确定Phi ，那么这是一个很好的候选人），然后a你的剩余数据的数量级上的X, Y 。

The idea behind this is to replace Phi by its 'true' complex 'meaning' of phase , ie exp(1j * Phi) , but keeping it real all the while. 这背后的想法是，以取代Phi通过的阶段与其“真实”复杂的“意义”，即exp(1j * Phi)但保持它真正的所有时间。

You then proceed to calculate a distance based on this reparametrization, eg euclidean: 然后，您可以基于此重新参数化来计算距离，例如，欧几里得：

dist = ((np.array([X1, Y1, a* np.cos(Phi1), a * np.sin(Phi1)]) - np.array([X2, Y2, a * np.cos(Phi2), a * np.sin(Phi2)])) ** 2).sum()

This you do for every pair before feeding it to your DBSCAN object. 在将其输入到DBSCAN对象之前，您需要对每个配对执行此操作。

Answer 2

DBSCAN can work with arbitary distances . DBSCAN可以在任意距离下工作。

So first define a distance function (which will likely involve some trigonometric functions), then plug this into DBSCAN as similarity. 因此，首先定义一个距离函数（可能会涉及一些三角函数），然后将其作为相似度插入DBSCAN。

You could probably use something like this: 您可能会使用以下内容：

distance = (x1-x2)**2 + (y1-y1)**2 + factor * sin(phi1-phi2)**2

but you need to carefully choose your weight factor , as the angular difference is on a different scale as your X and Y axes, I guess. 但是我需要仔细选择权重factor ，因为角度差与X和Y轴的比例不同。

Don't use Euclidean distance on this data set, for the obvious reasons. 出于显而易见的原因，请勿在此数据集上使用欧几里得距离。

使用dbscan对角度数据进行聚类

问题描述

2 个解决方案

解决方案1
2 2014-06-03 18:57:22

解决方案2
1 2014-06-04 09:44:40

使用dbscan对角度数据进行聚类

问题描述

2 个解决方案

解决方案1 2 2014-06-03 18:57:22

解决方案2 1 2014-06-04 09:44:40

解决方案1
2 2014-06-03 18:57:22

解决方案2
1 2014-06-04 09:44:40