简体   繁体   English

Julia 中 scipy 的 pdist() 的替代方法?

[英]Alternate approach for pdist() from scipy in Julia?

My objective is to replicate the functionality of pdist() from SciPy in Julia.我的目标是从SciPy中的 SciPy 复制pdist()的功能。 I tried using Distances.jl package to perform pairwise computation of distance between observations.我尝试使用Distances.jl package 对观测值之间的距离进行成对计算。 However, the results are not same as seen in the below mentioned example.但是,结果与下面提到的示例中看到的不同。

Python Example: Python 示例:

from scipy.spatial.distance import pdist
a = [[1,2], [3,4], [5,6], [7,8]]
b = pdist(a)
print(b)

output --> array([2.82842712, 5.65685425, 8.48528137, 2.82842712, 5.65685425, 2.82842712])

Julia Example: Julia 示例:

using Distances
a = [1 2; 3 4; 5 6; 7 8]
dist_function(x)  = pairwise(Euclidean(), x, dims = 1)
dist_function(a)

output --> 
4×4 Array{Float64,2}:
 0.0      2.82843  5.65685  8.48528
 2.82843  0.0      2.82843  5.65685
 5.65685  2.82843  0.0      2.82843
 8.48528  5.65685  2.82843  0.0

With reference to above examples:参考上面的例子:

  1. Is pdist() from SciPy in python has metric value set to Euclidean() by default? python 中的SciPypdist()是否默认将度量值设置为Euclidean()
  2. How may I approach this problem, to replicate the results in Julia?我该如何解决这个问题,以复制 Julia 中的结果?

Please suggest a solution to resolve this problem.请提出解决此问题的解决方案。

Documentation reference for pdist():--> https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html pdist() 的文档参考:--> https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.ZFC35FDC70D5FC69D269883A8

Thanks in advance!!提前致谢!!

According to the documentation page you linked, to get the same form as Julia from python (yes, I know, this is the reverse of your question), you can pass it to squareform.根据您链接的文档页面,要从 python 获得与 Julia 相同的表格(是的,我知道,这与您的问题相反),您可以将其传递给 squareform。 Ie in your example, add即在您的示例中,添加

from scipy.spatial.distance import squareform
squareform(b)

Also, yes, from the same documentation page, you can see that the 'metric' parameter defaults to 'euclidean' if not explictly defined.另外,是的,从同一个文档页面中,您可以看到如果没有明确定义,“metric”参数默认为“euclidean”。

For the reverse situation, simply note that the python vector is simply all the elements in the off-diagonal (since for a 'proper' distance metric, the resulting distance matrix is symmetric).对于相反的情况,只需注意 python 向量就是非对角线中的所有元素(因为对于“适当的”距离度量,得到的距离矩阵是对称的)。

So you can simply collect all the elements from the off-diagonal into a vector.因此,您可以简单地将所有非对角线元素收集到一个向量中。

For (1), the answer is yes as per the documentation you linked, which says at the top对于(1),根据您链接的文档,答案是肯定的,在顶部说

scipy.spatial.distance.pdist(X, metric='euclidean', *args, **kwargs)

indicating that the metric arg is indeed set to 'euclidean' by default.表明metric arg 确实默认设置为'euclidean'

I'm not sure I understand your second question - the results are the same?我不确定我是否理解您的第二个问题-结果是否相同? The only difference to me seems to be that scipy returns the upper triangular as a vector, so if it's just about doing this have a look at:https://discourse.julialang.org/t/vector-of-upper-triangle/7764对我来说唯一的区别似乎是 scipy 将上三角形作为向量返回,所以如果只是这样做,请查看:https://discourse.julialang.org/t/vector-of-upper-triangle/ 7764

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM