简体   繁体   English

向量化多维特征的径向基函数的欧式距离计算

[英]Vectorizing radial basis function's euclidean distance calculation for multidimensional features

I suspect there is probably a SO post that has answered this question, but I have not been able to find it yet, so I apologize in advance if this is a duplicate question. 我怀疑可能有一个SO帖子已经回答了这个问题,但是我还没有找到它,所以如果这是一个重复的问题,我提前致歉。

I'm attempting to implement a radial basis function kernel from scratch using Numpy for my own learning purposes. 我出于自己的学习目的,尝试使用Numpy从头开始实现径向基函数内核。 For one-dimensional inputs, the calculation is pretty easy: 对于一维输入,计算非常简单:

def kernel(x, y):
    return * np.exp( -0.5 * np.subtract.outer(x, y)**2)

The above is from a blog post on Gaussian Processes . 以上摘自有关高斯过程博客文章

But I'm trying to extend this to multiple dimensions. 但我正在尝试将此扩展到多个维度。 I have an implementation that works fine below: 我有一个可以在下面正常运行的实现:

x = np.array([[4,3,5], [1,3,9], [0,1,0], [4,3,5]])
distances = []
γ = -.5
for i in x:
    for j in x:
        distances.append(np.exp(γ * np.linalg.norm(i - j) ** 2))
np.array(distances).reshape(len(x),len(x))

[[1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]
 [3.72665317e-06 1.00000000e+00 2.11513104e-19 3.72665317e-06]
 [1.69189792e-10 2.11513104e-19 1.00000000e+00 1.69189792e-10]
 [1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]]

I am checking using sklearn.pairwise.rbf_kernel 我正在使用sklearn.pairwise.rbf_kernel检查

from sklearn.metrics.pairwise import rbf_kernel
print(rbf_kernel(x, gamma= .5))

[[1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]
 [3.72665317e-06 1.00000000e+00 2.11513104e-19 3.72665317e-06]
 [1.69189792e-10 2.11513104e-19 1.00000000e+00 1.69189792e-10]
 [1.00000000e+00 3.72665317e-06 1.69189792e-10 1.00000000e+00]]

But clearly the double for loops are not the most efficient way of iterating through this. 但是很明显,double for循环并不是迭代此过程的最有效方法。 What's the best way to vectorize this operation? 向量化此操作的最佳方法是什么?

This SO post provides an efficient way of calculating distances, but does not provide the vectorization I need. 这样的SO post提供了一种有效的距离计算方法,但是没有提供我需要的矢量化。

We can use SciPy's cdist and then scale those with the exponential values - 我们可以使用SciPy的cdist ,然后使用指数值缩放它们-

from scipy.spatial.distance import cdist

lam = -.5
out = np.exp(lam* cdist(x,x,'sqeuclidean'))

We can also leverage matrix-mutliplication - 我们还可以利用matrix-mutliplication -

def sqcdist_own(x):
    row_sum = (x**2).sum(1) # or np.einsum('ij,ij->i',x,x)
    sqeucdist = row_sum - 2*x.dot(x.T)
    sqeucdist += row_sum[:,None]
    return sqeucdist

out = np.exp(lam* cdist(x,x,'sqeuclidean'))

To use these approaches on both 2D and 1D cases, reshape x to 2D as a pre-processing step : X = x.reshape(len(x),-1) and then use X instead as the input into these solutions. 为使用这两种方法2D1D的情况下,重塑x2D作为预处理步骤: X = x.reshape(len(x),-1)然后使用X代替作为输入到这些解决方案。

You can utilize the following observation to solve the problem: 您可以利用以下观察来解决问题:

||a - b|| ** 2 = ||a|| ** 2 + ||b|| ** 2 - 2 * <a, b>

In code, it will look as following: 在代码中,它将如下所示:

x_norm = np.linalg.norm(x, axis=1) ** 2
output = np.exp(- 0.5 * (x_norm.reshape(-1, 1) + x_norm.reshape(1, -1) - 2 * np.dot(x, x.T)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM