[英]clustering with angular data using dbscan
I need to cluster data-points in the form of X,Y,Phi. 我需要以X,Y,Phi的形式对数据点进行聚类。 Right now I use DBSCAN (sklearn).
现在,我使用DBSCAN(sklearn)。 The clustering works except for one thing: Phi is angular data, which is modulo 2*Pi.
除一件事外,聚类工作:Phi是角度数据,其模数为2 * Pi。 As a result the clustering near Phi=0 is incorrect.
结果,Phi = 0附近的聚类是不正确的。 Is there a trick to fix this?
有解决这个问题的技巧吗? I could not find one or make one up myself that worked.
我找不到一个人,或者自己一个人工作了。
Thank you. 谢谢。
Circular boundary conditions are not easy to implement practically outside a Fourier framework. 圆形边界条件实际上很难在傅立叶框架之外实现。
You could try reparametrizing by replacing X, Y, Phi
with X, Y, a * cos(Phi), a * sin(Phi)
, where a > 0
, a sort of scale factor, needs to be chosen correctly in order for this projection into 2D space to act the way you need in clustering. 您可以尝试通过将
X, Y, Phi
替换为X, Y, a * cos(Phi), a * sin(Phi)
来重新参数化,为此需要正确选择a > 0
(一种比例因子)投影到2D空间中,以按照聚类所需的方式进行操作。 Start by checking a = 1
(if it worked OK for pure Phi
, then this is a good candidate) and then a
on the order of magnitude of your remaining data X, Y
. 通过检查启动
a = 1
(如果它纯工作确定Phi
,那么这是一个很好的候选人),然后a
你的剩余数据的数量级上的X, Y
。
The idea behind this is to replace Phi
by its 'true' complex 'meaning' of phase , ie exp(1j * Phi)
, but keeping it real all the while. 这背后的想法是,以取代
Phi
通过的阶段与其“真实”复杂的“意义”,即exp(1j * Phi)
但保持它真正的所有时间。
You then proceed to calculate a distance based on this reparametrization, eg euclidean: 然后,您可以基于此重新参数化来计算距离,例如,欧几里得:
dist = ((np.array([X1, Y1, a* np.cos(Phi1), a * np.sin(Phi1)]) - np.array([X2, Y2, a * np.cos(Phi2), a * np.sin(Phi2)])) ** 2).sum()
This you do for every pair before feeding it to your DBSCAN object. 在将其输入到DBSCAN对象之前,您需要对每个配对执行此操作。
DBSCAN can work with arbitary distances . DBSCAN可以在任意距离下工作。
So first define a distance function (which will likely involve some trigonometric functions), then plug this into DBSCAN as similarity. 因此,首先定义一个距离函数(可能会涉及一些三角函数),然后将其作为相似度插入DBSCAN。
You could probably use something like this: 您可能会使用以下内容:
distance = (x1-x2)**2 + (y1-y1)**2 + factor * sin(phi1-phi2)**2
but you need to carefully choose your weight factor
, as the angular difference is on a different scale as your X
and Y
axes, I guess. 但是我需要仔细选择权重
factor
,因为角度差与X
和Y
轴的比例不同。
Don't use Euclidean distance on this data set, for the obvious reasons. 出于显而易见的原因,请勿在此数据集上使用欧几里得距离。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.