简体   繁体   中英

Create block diagonal numpy array from a given numpy array

I have a 2-dimensional numpy array with an equal number of columns and rows. I would like to arrange them into a bigger array having the smaller ones on the diagonal. It should be possible to specify how often the starting matrix should be on the diagonal. For example:

a = numpy.array([[5, 7], 
                 [6, 3]])

So if I wanted this array 2 times on the diagonal the desired output would be:

array([[5, 7, 0, 0], 
       [6, 3, 0, 0], 
       [0, 0, 5, 7], 
       [0, 0, 6, 3]])

For 3 times:

array([[5, 7, 0, 0, 0, 0], 
       [6, 3, 0, 0, 0, 0], 
       [0, 0, 5, 7, 0, 0], 
       [0, 0, 6, 3, 0, 0],
       [0, 0, 0, 0, 5, 7],
       [0, 0, 0, 0, 6, 3]])

Is there a fast way to implement this with numpy methods and for arbitrary sizes of the starting array (still considering the starting array to have the same number of rows and columns)?

Approach #1

Classic case of numpy.kron -

np.kron(np.eye(r,dtype=int),a) # r is number of repeats

Sample run -

In [184]: a
Out[184]: 
array([[1, 2, 3],
       [3, 4, 5]])

In [185]: r = 3 # number of repeats

In [186]: np.kron(np.eye(r,dtype=int),a)
Out[186]: 
array([[1, 2, 3, 0, 0, 0, 0, 0, 0],
       [3, 4, 5, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 2, 3, 0, 0, 0],
       [0, 0, 0, 3, 4, 5, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 2, 3],
       [0, 0, 0, 0, 0, 0, 3, 4, 5]])

Approach #2

Another efficient one with diagonal-viewed-array-assignment -

def repeat_along_diag(a, r):
    m,n = a.shape
    out = np.zeros((r,m,r,n), dtype=a.dtype)
    diag = np.einsum('ijik->ijk',out)
    diag[:] = a
    return out.reshape(-1,n*r)

Sample run -

In [188]: repeat_along_diag(a,3)
Out[188]: 
array([[1, 2, 3, 0, 0, 0, 0, 0, 0],
       [3, 4, 5, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 2, 3, 0, 0, 0],
       [0, 0, 0, 3, 4, 5, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 2, 3],
       [0, 0, 0, 0, 0, 0, 3, 4, 5]])
import numpy as np
from scipy.linalg import block_diag

a = np.array([[5, 7], 
              [6, 3]])

n = 3

d = block_diag(*([a] * n))

d

array([[5, 7, 0, 0, 0, 0],
       [6, 3, 0, 0, 0, 0],
       [0, 0, 5, 7, 0, 0],
       [0, 0, 6, 3, 0, 0],
       [0, 0, 0, 0, 5, 7],
       [0, 0, 0, 0, 6, 3]], dtype=int32)

But it looks like np.kron solution is a little bit faster for small n.

%timeit np.kron(np.eye(n), a)
12.4 µs ± 95.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit block_diag(*([a] * n))
19.2 µs ± 34.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

However for n = 300, for example, the block_diag is much faster:

%timeit block_diag(*([a] * n))
1.11 ms ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.kron(np.eye(n), a)
4.87 ms ± 31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

For the specialized case of matrices, a simple slicing is WAY faster then numpy.kron() (the slowest) and mostly on par with numpy.einsum() -based approach (from @Divakar answer). Compared to scipy.linalg.block_diag() , it performs better for smaller arr , somewhat independently of number of block repetitions.

Note that the performances of block_diag_view() on smaller inputs can be easily further improved with Numba, but one would get worse performances for larger inputs.

With Numba, full explicit looping and parallelization ( block_diag_loop_jit() ) one would get again similar results as block_diag_einsum() if the number of repetitions is small.

Overall, the most performing solutions are block_diag_einsum() and block_diag_view() .

import numpy as np
import scipy as sp
import numba as nb

import scipy.linalg


NUM = 4
M = 9


def block_diag_kron(arr, num=NUM):
    return np.kron(np.eye(num), arr)


def block_diag_einsum(arr, num=NUM):
    rows, cols = arr.shape
    result = np.zeros((num, rows, num, cols), dtype=arr.dtype)
    diag = np.einsum('ijik->ijk', result)
    diag[:] = arr
    return result.reshape(rows * num, cols * num)


def block_diag_scipy(arr, num=NUM):
    return sp.linalg.block_diag(*([arr] * num))


def block_diag_view(arr, num=NUM):
    rows, cols = arr.shape
    result = np.zeros((num * rows, num * cols), dtype=arr.dtype)
    for k in range(num):
        result[k * rows:(k + 1) * rows, k * cols:(k + 1) * cols] = arr
    return result


@nb.jit
def block_diag_view_jit(arr, num=NUM):
    rows, cols = arr.shape
    result = np.zeros((num * rows, num * cols), dtype=arr.dtype)
    for k in range(num):
        result[k * rows:(k + 1) * rows, k * cols:(k + 1) * cols] = arr
    return result


@nb.jit(parallel=True)
def block_diag_loop_jit(arr, num=NUM):
    rows, cols = arr.shape
    result = np.zeros((num * rows, num * cols), dtype=arr.dtype)
    for k in nb.prange(num):
        for i in nb.prange(rows):
            for j in nb.prange(cols):
                result[i + (rows * k), j + (cols * k)] = arr[i, j]
    return result

Benchmarks for NUM = 4 :

bm_4

Benchmarks for NUM = 400 :

bm_400


Plots were produced from this template using the following additional code:

def gen_input(n):
    return np.random.randint(1, M, (n, n))


def equal_output(a, b):
    return np.all(a == b)


funcs = block_diag_kron, block_diag_scipy, block_diag_view, block_diag_jit


input_sizes = tuple(int(2 ** (2 + (3 * i) / 4)) for i in range(13))
print('Input Sizes:\n', input_sizes, '\n')


runtimes, input_sizes, labels, results = benchmark(
    funcs, gen_input=gen_input, equal_output=equal_output,
    input_sizes=input_sizes)


plot_benchmarks(runtimes, input_sizes, labels, units='ms')

(EDITED to include np.einsum() -based approach and another Numba version with explicit looping.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM