简体   繁体   English

NumPy:如何快速标准化许多载体?

[英]NumPy: how to quickly normalize many vectors?

How can a list of vectors be elegantly normalized, in NumPy? 在NumPy中,如何将矢量列表优雅地规范化?

Here is an example that does not work: 这里是行不通的例子:

from numpy import *

vectors = array([arange(10), arange(10)])  # All x's, then all y's
norms = apply_along_axis(linalg.norm, 0, vectors)

# Now, what I was expecting would work:
print vectors.T / norms  # vectors.T has 10 elements, as does norms, but this does not work

The last operation yields "shape mismatch: objects cannot be broadcast to a single shape". 最后一个操作产生“形状不匹配:对象不能广播到单个形状”。

How can the normalization of the 2D vectors in vectors be elegantly done, with NumPy? 如何能在2D向量的标准化vectors来完成优雅与NumPy的?

Edit : Why does the above not work while adding a dimension to norms does work (as per my answer below)? 编辑 :为什么在向norms添加维度时上述不起作用确实有效(根据我的答案如下)?

Computing the magnitude 计算幅度

I came across this question and became curious about your method for normalizing. 我遇到了这个问题,并对你的规范化方法感到好奇。 I use a different method to compute the magnitudes. 我使用不同的方法来计算幅度。 Note: I also typically compute norms across the last index (rows in this case, not columns). 注意:我通常还会计算最后一个索引的规范(在这种情况下是行,而不是列)。

magnitudes = np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]

Typically, however, I just normalize like so: 但是,通常我会像这样标准化:

vectors /= np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]

A time comparison 时间比较

I ran a test to compare the times, and found that my method is faster by quite a bit, but Freddie Witherdon 's suggestion is even faster. 我进行了一次测试以比较时间,发现我的方法相当快,但Freddie Witherdon的建议更快。

import numpy as np    
vectors = np.random.rand(100, 25)

# OP's
%timeit np.apply_along_axis(np.linalg.norm, 1, vectors)
# Output: 100 loops, best of 3: 2.39 ms per loop

# Mine
%timeit np.sqrt((vectors ** 2).sum(-1))[..., np.newaxis]
# Output: 10000 loops, best of 3: 13.8 us per loop

# Freddie's (from comment below)
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 10000 loops, best of 3: 6.45 us per loop

Beware though, as this StackOverflow answer notes, there are some safety checks not happening with einsum , so you should be sure that the dtype of vectors is sufficient to store the square of the magnitudes accurately enough. 要小心的是,因为这StackOverflow的答案笔记,还有一些安全检查,不发生einsum ,所以你应该确保dtypevectors是足以存储大小足够准确的平方。

Well, unless I missed something, this does work: 好吧,除非我错过了什么,这确实有效:

vectors / norms

The problem in your suggestion is the broadcasting rules. 你的建议中的问题是广播规则。

vectors  # shape 2, 10
norms  # shape 10

The shape do not have the same length! 形状长度不一样! So the rule is to first extend the small shape by one on the left : 所以规则是首先将左边的小形状延伸一个:

norms  # shape 1,10

You can do that manually by calling: 你可以通过调用手动完成:

vectors / norms.reshape(1,-1)  # same as vectors/norms

If you wanted to compute vectors.T/norms , you would have to do the reshaping manually, as follows: 如果你想计算vectors.T/norms ,你必须手动进行重塑,如下所示:

vectors.T / norms.reshape(-1,1)  # this works

Alright: NumPy's array shape broadcast adds dimensions to the left of the array shape, not to its right. 好吧:NumPy的阵列形状广播在阵列形状的左边增加了尺寸,而不是右边。 NumPy can however be instructed to add a dimension to the right of the norms array: 但是,可以指示NumPy在norms数组的右侧添加维度:

print vectors.T / norms[:, newaxis]

does work! 确实有效!

there is already a function in scikit learn: scikit中已经有一个函数学习:

import sklearn.preprocessing as preprocessing
norm =preprocessing.normalize(m, norm='l2')*

More info at: 更多信息:

http://scikit-learn.org/stable/modules/preprocessing.html http://scikit-learn.org/stable/modules/preprocessing.html

My preferred way to normalize vectors is by using numpy's inner1d to calculate their magnitudes. 我对矢量标准化的首选方法是使用numpy的inner1d来计算它们的大小。 Here's what's been suggested so far compared to inner1d 这是迄今为止与inner1d相比所提出的建议

import numpy as np
from numpy.core.umath_tests import inner1d
COUNT = 10**6 # 1 million points

points = np.random.random_sample((COUNT,3,))
A      = np.sqrt(np.einsum('...i,...i', points, points))
B      = np.apply_along_axis(np.linalg.norm, 1, points)   
C      = np.sqrt((points ** 2).sum(-1))
D      = np.sqrt((points*points).sum(axis=1))
E      = np.sqrt(inner1d(points,points))

print [np.allclose(E,x) for x in [A,B,C,D]] # [True, True, True, True]

Testing performance with cProfile: 使用cProfile测试性能:

import cProfile
cProfile.run("np.sqrt(np.einsum('...i,...i', points, points))**0.5") # 3 function calls in 0.013 seconds
cProfile.run('np.apply_along_axis(np.linalg.norm, 1, points)')       # 9000018 function calls in 10.977 seconds
cProfile.run('np.sqrt((points ** 2).sum(-1))')                       # 5 function calls in 0.028 seconds
cProfile.run('np.sqrt((points*points).sum(axis=1))')                 # 5 function calls in 0.027 seconds
cProfile.run('np.sqrt(inner1d(points,points))')                      # 2 function calls in 0.009 seconds

inner1d computed the magnitudes a hair faster than einsum. inner1d计算头发的速度比einsum快。 So using inner1d to normalize: 所以使用inner1d来规范化:

n = points/np.sqrt(inner1d(points,points))[:,None]
cProfile.run('points/np.sqrt(inner1d(points,points))[:,None]') # 2 function calls in 0.026 seconds

Testing against scikit: 针对scikit进行测试:

import sklearn.preprocessing as preprocessing
n_ = preprocessing.normalize(points, norm='l2')
cProfile.run("preprocessing.normalize(points, norm='l2')") # 47 function calls in 0.047 seconds
np.allclose(n,n_) # True

Conclusion: using inner1d seems to be the best option 结论:使用inner1d似乎是最好的选择

For the two-dimensional case, using np.hypot(vectors[:,0],vectors[:,1]) looks to be faster than Freddie Witherden's np.sqrt(np.einsum('...i,...i', vectors, vectors)) for calculating the magnitudes. 对于二维情况,使用np.hypot(vectors[:,0],vectors[:,1])看起来比Freddie Witherden的np.sqrt(np.einsum('...i,...i', vectors, vectors)) np.hypot(vectors[:,0],vectors[:,1])更快np.sqrt(np.einsum('...i,...i', vectors, vectors))用于计算幅度。 (Referencing answer by Geoff) (参考Geoff的回答)

import numpy as np

# Generate array of 2D vectors.
vectors = np.random.random((1000,2))

# Using Freddie's
%timeit np.sqrt(np.einsum('...i,...i', vectors, vectors))
# Output: 11.1 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# Using numpy.hypot()
%timeit np.hypot(vectors[:,0], vectors[:,1])
# Output: 6.81 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

To get the normalised vectors then do: 要获得规范化的向量,请执行以下操作:

vectors /= np.hypot(vectors[:,0], vectors[:,1])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM