从给定的numpy数组创建块对角线numpy数组

Question

I have a 2-dimensional numpy array with an equal number of columns and rows. 我有一个二维的numpy数组，具有相等数量的列和行。 I would like to arrange them into a bigger array having the smaller ones on the diagonal. 我想将它们排列成一个较大的数组，对角线上的数组较小。 It should be possible to specify how often the starting matrix should be on the diagonal. 应该可以指定起始矩阵在对角线上的频率。 For example: 例如：

a = numpy.array([[5, 7], 
                 [6, 3]])

So if I wanted this array 2 times on the diagonal the desired output would be: 因此，如果我希望此数组在对角线上2次，则所需的输出将是：

array([[5, 7, 0, 0], 
       [6, 3, 0, 0], 
       [0, 0, 5, 7], 
       [0, 0, 6, 3]])

For 3 times: 3次：

array([[5, 7, 0, 0, 0, 0], 
       [6, 3, 0, 0, 0, 0], 
       [0, 0, 5, 7, 0, 0], 
       [0, 0, 6, 3, 0, 0],
       [0, 0, 0, 0, 5, 7],
       [0, 0, 0, 0, 6, 3]])

Is there a fast way to implement this with numpy methods and for arbitrary sizes of the starting array (still considering the starting array to have the same number of rows and columns)? 有没有一种快速的方法来使用numpy方法并针对任意大小的起始数组（仍然考虑到起始数组具有相同的行数和列数）来实现此目的？

Answer 1

Approach #1 方法1

Classic case of numpy.kron - numpy.kron经典案例-

np.kron(np.eye(r,dtype=int),a) # r is number of repeats

Sample run - 样品运行-

In [184]: a
Out[184]: 
array([[1, 2, 3],
       [3, 4, 5]])

In [185]: r = 3 # number of repeats

In [186]: np.kron(np.eye(r,dtype=int),a)
Out[186]: 
array([[1, 2, 3, 0, 0, 0, 0, 0, 0],
       [3, 4, 5, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 2, 3, 0, 0, 0],
       [0, 0, 0, 3, 4, 5, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 2, 3],
       [0, 0, 0, 0, 0, 0, 3, 4, 5]])

Approach #2 方法＃2

Another efficient one with diagonal-viewed-array-assignment - 另一个有效的方法是使用diagonal-viewed-array-assignment -

def repeat_along_diag(a, r):
    m,n = a.shape
    out = np.zeros((r,m,r,n), dtype=a.dtype)
    diag = np.einsum('ijik->ijk',out)
    diag[:] = a
    return out.reshape(-1,n*r)

Sample run - 样品运行-

In [188]: repeat_along_diag(a,3)
Out[188]: 
array([[1, 2, 3, 0, 0, 0, 0, 0, 0],
       [3, 4, 5, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 2, 3, 0, 0, 0],
       [0, 0, 0, 3, 4, 5, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 2, 3],
       [0, 0, 0, 0, 0, 0, 3, 4, 5]])

Answer 2

import numpy as np
from scipy.linalg import block_diag

a = np.array([[5, 7], 
              [6, 3]])

n = 3

d = block_diag(*([a] * n))

d

array([[5, 7, 0, 0, 0, 0],
       [6, 3, 0, 0, 0, 0],
       [0, 0, 5, 7, 0, 0],
       [0, 0, 6, 3, 0, 0],
       [0, 0, 0, 0, 5, 7],
       [0, 0, 0, 0, 6, 3]], dtype=int32)

But it looks like np.kron solution is a little bit faster for small n. 但是看起来np.kron解决方案对于小n来说要快一些。

%timeit np.kron(np.eye(n), a)
12.4 µs ± 95.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit block_diag(*([a] * n))
19.2 µs ± 34.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

However for n = 300, for example, the block_diag is much faster: 但是，例如对于n = 300，block_diag要快得多：

%timeit block_diag(*([a] * n))
1.11 ms ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.kron(np.eye(n), a)
4.87 ms ± 31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answer 3

For the specialized case of matrices, a simple slicing is WAY faster then numpy.kron() (the slowest) and mostly on par with numpy.einsum() -based approach (from @Divakar answer). 对于矩阵的特殊情况，简单的切片比numpy.kron() （最慢）快得多，并且与基于numpy.einsum()的方法差不多（来自@Divakar答案）。 Compared to scipy.linalg.block_diag() , it performs better for smaller arr , somewhat independently of number of block repetitions. 与scipy.linalg.block_diag()相比，它对于较小的arr表现更好，在某种程度上与块重复次数无关。

Note that the performances of block_diag_view() on smaller inputs can be easily further improved with Numba, but one would get worse performances for larger inputs. 请注意，使用block_diag_view()可以轻松地进一步改善在较小的输入上的block_diag_view()的性能，但是对于较大的输入，则性能会变差。

With Numba, full explicit looping and parallelization ( block_diag_loop_jit() ) one would get again similar results as block_diag_einsum() if the number of repetitions is small. 使用Numba，如果重复次数很小，则完全显式循环和并行化（ block_diag_loop_jit() ）会再次获得与block_diag_einsum()类似的结果。

Overall, the most performing solutions are block_diag_einsum() and block_diag_view() . 总体而言，性能最高的解决方案是block_diag_einsum()和block_diag_view() 。

import numpy as np
import scipy as sp
import numba as nb

import scipy.linalg


NUM = 4
M = 9


def block_diag_kron(arr, num=NUM):
    return np.kron(np.eye(num), arr)


def block_diag_einsum(arr, num=NUM):
    rows, cols = arr.shape
    result = np.zeros((num, rows, num, cols), dtype=arr.dtype)
    diag = np.einsum('ijik->ijk', result)
    diag[:] = arr
    return result.reshape(rows * num, cols * num)


def block_diag_scipy(arr, num=NUM):
    return sp.linalg.block_diag(*([arr] * num))


def block_diag_view(arr, num=NUM):
    rows, cols = arr.shape
    result = np.zeros((num * rows, num * cols), dtype=arr.dtype)
    for k in range(num):
        result[k * rows:(k + 1) * rows, k * cols:(k + 1) * cols] = arr
    return result


@nb.jit
def block_diag_view_jit(arr, num=NUM):
    rows, cols = arr.shape
    result = np.zeros((num * rows, num * cols), dtype=arr.dtype)
    for k in range(num):
        result[k * rows:(k + 1) * rows, k * cols:(k + 1) * cols] = arr
    return result


@nb.jit(parallel=True)
def block_diag_loop_jit(arr, num=NUM):
    rows, cols = arr.shape
    result = np.zeros((num * rows, num * cols), dtype=arr.dtype)
    for k in nb.prange(num):
        for i in nb.prange(rows):
            for j in nb.prange(cols):
                result[i + (rows * k), j + (cols * k)] = arr[i, j]
    return result

Benchmarks for NUM = 4 : NUM = 4基准：

Benchmarks for NUM = 400 : NUM = 400基准：

Plots were produced from this template using the following additional code: 使用以下附加代码从该模板生成了图：

def gen_input(n):
    return np.random.randint(1, M, (n, n))


def equal_output(a, b):
    return np.all(a == b)


funcs = block_diag_kron, block_diag_scipy, block_diag_view, block_diag_jit


input_sizes = tuple(int(2 ** (2 + (3 * i) / 4)) for i in range(13))
print('Input Sizes:\n', input_sizes, '\n')


runtimes, input_sizes, labels, results = benchmark(
    funcs, gen_input=gen_input, equal_output=equal_output,
    input_sizes=input_sizes)


plot_benchmarks(runtimes, input_sizes, labels, units='ms')

(EDITED to include np.einsum() -based approach and another Numba version with explicit looping.) （已编辑，包括基于np.einsum()的方法和具有显式循环的另一个Numba版本。）

从给定的numpy数组创建块对角线numpy数组

问题描述

3 个解决方案

解决方案1
16 已采纳 2015-11-03 20:23:14

解决方案2
3 2018-05-28 12:40:55

解决方案3
2 2019-09-19 18:51:43

从给定的numpy数组创建块对角线numpy数组

问题描述

3 个解决方案

解决方案1 16 已采纳 2015-11-03 20:23:14

解决方案2 3 2018-05-28 12:40:55

解决方案3 2 2019-09-19 18:51:43

解决方案1
16 已采纳 2015-11-03 20:23:14

解决方案2
3 2018-05-28 12:40:55

解决方案3
2 2019-09-19 18:51:43