简体   繁体   English

“克隆”行或列向量

[英]"Cloning" row or column vectors

Sometimes it is useful to "clone" a row or column vector to a matrix.有时将行或列向量“克隆”为矩阵很有用。 By cloning I mean converting a row vector such as通过克隆,我的意思是转换一个行向量,例如

[1, 2, 3]

Into a matrix成矩阵

[[1, 2, 3],
 [1, 2, 3],
 [1, 2, 3]]

or a column vector such as或列向量,例如

[[1],
 [2],
 [3]]

into进入

[[1, 1, 1]
 [2, 2, 2]
 [3, 3, 3]]

In MATLAB or octave this is done pretty easily:在 MATLAB 或 Octave 中,这很容易完成:

 x = [1, 2, 3]
 a = ones(3, 1) * x
 a =

    1   2   3
    1   2   3
    1   2   3
    
 b = (x') * ones(1, 3)
 b =

    1   1   1
    2   2   2
    3   3   3

I want to repeat this in numpy, but unsuccessfully我想用 numpy 重复这个,但没有成功

In [14]: x = array([1, 2, 3])
In [14]: ones((3, 1)) * x
Out[14]:
array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.],
       [ 1.,  2.,  3.]])
# so far so good
In [16]: x.transpose() * ones((1, 3))
Out[16]: array([[ 1.,  2.,  3.]])
# DAMN
# I end up with 
In [17]: (ones((3, 1)) * x).transpose()
Out[17]:
array([[ 1.,  1.,  1.],
       [ 2.,  2.,  2.],
       [ 3.,  3.,  3.]])

Why wasn't the first method ( In [16] ) working?为什么第一种方法( In [16] )不起作用? Is there a way to achieve this task in python in a more elegant way?有没有办法以更优雅的方式在python中完成这项任务?

Use numpy.tile :使用numpy.tile

>>> tile(array([1,2,3]), (3, 1))
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

or for repeating columns:或用于重复列:

>>> tile(array([[1,2,3]]).transpose(), (1, 3))
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])

Here's an elegant, Pythonic way to do it:这是一种优雅的 Pythonic 方法:

>>> array([[1,2,3],]*3)
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

>>> array([[1,2,3],]*3).transpose()
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])

the problem with [16] seems to be that the transpose has no effect for an array. [16]的问题似乎是转置对数组没有影响。 you're probably wanting a matrix instead:你可能想要一个矩阵:

>>> x = array([1,2,3])
>>> x
array([1, 2, 3])
>>> x.transpose()
array([1, 2, 3])
>>> matrix([1,2,3])
matrix([[1, 2, 3]])
>>> matrix([1,2,3]).transpose()
matrix([[1],
        [2],
        [3]])

First note that with numpy's broadcasting operations it's usually not necessary to duplicate rows and columns.首先请注意,使用 numpy 的广播操作通常不需要复制行和列。 See this and this for descriptions.有关说明,请参阅

But to do this, repeat and newaxis are probably the best way但要做到这一点, repeatnewaxis可能是最好的方法

In [12]: x = array([1,2,3])

In [13]: repeat(x[:,newaxis], 3, 1)
Out[13]: 
array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])

In [14]: repeat(x[newaxis,:], 3, 0)
Out[14]: 
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]])

This example is for a row vector, but applying this to a column vector is hopefully obvious.此示例适用于行向量,但希望将其应用于列向量是显而易见的。 repeat seems to spell this well, but you can also do it via multiplication as in your example重复似乎拼写得很好,但你也可以通过乘法来做到这一点,就像你的例子一样

In [15]: x = array([[1, 2, 3]])  # note the double brackets

In [16]: (ones((3,1))*x).transpose()
Out[16]: 
array([[ 1.,  1.,  1.],
       [ 2.,  2.,  2.],
       [ 3.,  3.,  3.]])

Let:让:

>>> n = 1000
>>> x = np.arange(n)
>>> reps = 10000

Zero-cost allocations零成本分配

A view does not take any additional memory. 视图不占用任何额外的内存。 Thus, these declarations are instantaneous:因此,这些声明是即时的:

# New axis
x[np.newaxis, ...]

# Broadcast to specific shape
np.broadcast_to(x, (reps, n))

Forced allocation强制分配

If you want force the contents to reside in memory:如果要强制内容驻留在内存中:

>>> %timeit np.array(np.broadcast_to(x, (reps, n)))
10.2 ms ± 62.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit np.repeat(x[np.newaxis, :], reps, axis=0)
9.88 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit np.tile(x, (reps, 1))
9.97 ms ± 77.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

All three methods are roughly the same speed.这三种方法的速度大致相同。

Computation计算

>>> a = np.arange(reps * n).reshape(reps, n)
>>> x_tiled = np.tile(x, (reps, 1))

>>> %timeit np.broadcast_to(x, (reps, n)) * a
17.1 ms ± 284 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit x[np.newaxis, :] * a
17.5 ms ± 300 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit x_tiled * a
17.6 ms ± 240 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

All three methods are roughly the same speed.这三种方法的速度大致相同。


Conclusion结论

If you want to replicate before a computation, consider using one of the "zero-cost allocation" methods.如果您想在计算之前进行复制,请考虑使用“零成本分配”方法之一。 You won't suffer the performance penalty of "forced allocation".您不会遭受“强制分配”的性能损失。

I think using the broadcast in numpy is the best, and faster我认为在 numpy 中使用广播是最好的,而且速度更快

I did a compare as following我做了如下比较

import numpy as np
b = np.random.randn(1000)
In [105]: %timeit c = np.tile(b[:, newaxis], (1,100))
1000 loops, best of 3: 354 µs per loop

In [106]: %timeit c = np.repeat(b[:, newaxis], 100, axis=1)
1000 loops, best of 3: 347 µs per loop

In [107]: %timeit c = np.array([b,]*100).transpose()
100 loops, best of 3: 5.56 ms per loop

about 15 times faster using broadcast使用广播快 15 倍

One clean solution is to use NumPy's outer-product function with a vector of ones:一种干净的解决方案是使用带有 1 向量的 NumPy 的外积函数:

np.outer(np.ones(n), x)

gives n repeating rows.给出n重复行。 Switch the argument order to get repeating columns.切换参数顺序以获取重复列。 To get an equal number of rows and columns you might do要获得相同数量的行和列,您可能会这样做

np.outer(np.ones_like(x), x)

You can use您可以使用

np.tile(x,3).reshape((4,3))

tile will generate the reps of the vector tile 将生成向量的代表

and reshape will give it the shape you want和重塑会给它你想要的形状

If you have a pandas dataframe and want to preserve the dtypes, even the categoricals, this is a fast way to do it:如果您有一个 Pandas 数据框并希望保留 dtypes,甚至是分类数据,这是一种快速的方法:

import numpy as np
import pandas as pd
df = pd.DataFrame({1: [1, 2, 3], 2: [4, 5, 6]})
number_repeats = 50
new_df = df.reindex(np.tile(df.index, number_repeats))

Returning to the original question回到最初的问题

In MATLAB or octave this is done pretty easily:在 MATLAB 或 Octave 中,这很容易完成:

x = [1, 2, 3] x = [1, 2, 3]

a = ones(3, 1) * x ... a = 个 (3, 1) * x ...

In numpy it's pretty much the same (and easy to memorize too):在 numpy 中,它几乎相同(也很容易记住):

x = [1, 2, 3]
a = np.tile(x, (3, 1))

Another solution另一种解决方案

>> x = np.array([1,2,3])
>> y = x[None, :] * np.ones((3,))[:, None]
>> y
array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.],
       [ 1.,  2.,  3.]])

Why?为什么? Sure, repeat and tile are the correct way to do this.当然,repeat 和 tile 是正确的方法。 But None indexing is a powerful tool that has many times let me quickly vectorize an operation (though it can quickly be very memory expensive!).但是 None 索引是一个强大的工具,它多次让我快速矢量化操作(尽管它可能很快会占用大量内存!)。

An example from my own code:我自己的代码中的一个例子:

# trajectory is a sequence of xy coordinates [n_points, 2]
# xy_obstacles is a list of obstacles' xy coordinates [n_obstacles, 2]
# to compute dx, dy distance between every obstacle and every pose in the trajectory
deltas = trajectory[:, None, :2] - xy_obstacles[None, :, :2]
# we can easily convert x-y distance to a norm
distances = np.linalg.norm(deltas, axis=-1)
# distances is now [timesteps, obstacles]. Now we can for example find the closest obstacle at every point in the trajectory by doing
closest_obstacles = np.argmin(distances, axis=1)
# we could also find how safe the trajectory is, by finding the smallest distance over the entire trajectory
danger = np.min(distances)

To answer the actual question, now that nearly a dozen approaches to working around a solution have been posted: x.transpose reverses the shape of x .为了回答实际问题,现在已经发布了十几种解决方案的方法: x.transpose反转x的形状。 One of the interesting side-effects is that if x.ndim == 1 , the transpose does nothing.有趣的副作用之一是,如果x.ndim == 1 ,则转置什么都不做。

This is especially confusing for people coming from MATLAB, where all arrays implicitly have at least two dimensions.这对于来自 MATLAB 的人来说尤其令人困惑,因为所有数组都隐含地至少有两个维度。 The correct way to transpose a 1D numpy array is not x.transpose() or xT , but rather转置一维 numpy 数组的正确方法不是x.transpose()xT ,而是

x[:, None]

or或者

x.reshape(-1, 1)

From here, you can multiply by a matrix of ones, or use any of the other suggested approaches, as long as you respect the (subtle) differences between MATLAB and numpy.从这里开始,您可以乘以一个矩阵,或者使用任何其他建议的方法,只要您尊重 MATLAB 和 numpy 之间的(细微)差异。

import numpy as np
x=np.array([1,2,3])
y=np.multiply(np.ones((len(x),len(x))),x).T
print(y)

yields:产量:

[[ 1.  1.  1.]
 [ 2.  2.  2.]
 [ 3.  3.  3.]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM