简体   繁体   English

使用dbscan对角度数据进行聚类

[英]clustering with angular data using dbscan

I need to cluster data-points in the form of X,Y,Phi. 我需要以X,Y,Phi的形式对数据点进行聚类。 Right now I use DBSCAN (sklearn). 现在,我使用DBSCAN(sklearn)。 The clustering works except for one thing: Phi is angular data, which is modulo 2*Pi. 除一件事外,聚类工作:Phi是角度数据,其模数为2 * Pi。 As a result the clustering near Phi=0 is incorrect. 结果,Phi = 0附近的聚类是不正确的。 Is there a trick to fix this? 有解决这个问题的技巧吗? I could not find one or make one up myself that worked. 我找不到一个人,或者自己一个人工作了。

Thank you. 谢谢。

Circular boundary conditions are not easy to implement practically outside a Fourier framework. 圆形边界条件实际上很难在傅立叶框架之外实现。

You could try reparametrizing by replacing X, Y, Phi with X, Y, a * cos(Phi), a * sin(Phi) , where a > 0 , a sort of scale factor, needs to be chosen correctly in order for this projection into 2D space to act the way you need in clustering. 您可以尝试通过将X, Y, Phi替换为X, Y, a * cos(Phi), a * sin(Phi)来重新参数化,为此需要正确选择a > 0 (一种比例因子)投影到2D空间中,以按照聚类所需的方式进行操作。 Start by checking a = 1 (if it worked OK for pure Phi , then this is a good candidate) and then a on the order of magnitude of your remaining data X, Y . 通过检查启动a = 1 (如果它纯工作确定Phi ,那么这是一个很好的候选人),然后a你的剩余数据的数量级上的X, Y

The idea behind this is to replace Phi by its 'true' complex 'meaning' of phase , ie exp(1j * Phi) , but keeping it real all the while. 这背后的想法是,以取代Phi通过的阶段与其“真实”复杂的“意义”,即exp(1j * Phi)但保持它真正的所有时间。

You then proceed to calculate a distance based on this reparametrization, eg euclidean: 然后,您可以基于此重新参数化来计算距离,例如,欧几里得:

dist = ((np.array([X1, Y1, a* np.cos(Phi1), a * np.sin(Phi1)]) - np.array([X2, Y2, a * np.cos(Phi2), a * np.sin(Phi2)])) ** 2).sum()

This you do for every pair before feeding it to your DBSCAN object. 在将其输入到DBSCAN对象之前,您需要对每个配对执行此操作。

DBSCAN can work with arbitary distances . DBSCAN可以在任意距离下工作。

So first define a distance function (which will likely involve some trigonometric functions), then plug this into DBSCAN as similarity. 因此,首先定义一个距离函数(可能会涉及一些三角函数),然后将其作为相似度插入DBSCAN。

You could probably use something like this: 您可能会使用以下内容:

distance = (x1-x2)**2 + (y1-y1)**2 + factor * sin(phi1-phi2)**2

but you need to carefully choose your weight factor , as the angular difference is on a different scale as your X and Y axes, I guess. 但是我需要仔细选择权重factor ,因为角度差与XY轴的比例不同。

Don't use Euclidean distance on this data set, for the obvious reasons. 出于显而易见的原因,请勿在此数据集上使用欧几里得距离。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM