I have an array y, len(y) = M
that contains values from 0 -> N
. For example, with N = 3
:
y = [0, 2, 0, 1, 2, 1, 0, 2]
Incidence matrix A
is defined as followed:
MxM
A(i,j) = 1 if y(i) == y(j)
A(i,j) = 0 if y(i) != y(j)
A simple algorithm would be:
def incidence(y):
M = len(y)
A = np.zeros((M,M))
for i in range(M):
for j in range(M):
if y[i]==y[j]:
A[i,j] = 1
return A
But this is very slow. Is there any way to do this faster? Using list comprehension or vectorization, for example.
You can take advantage of numpy broadcasting to gain some efficiency here over our python by simply asking if y
equals its transpose:
import numpy as np
y = np.array([1, 2, 1, 0, 0, 1, 2])
def mat_me(y):
return (y == y.reshape(-1, 1)).astype(int)
mat_me(y)
which produces:
array([[1, 0, 1, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 0, 1],
[1, 0, 1, 0, 0, 1, 0],
[0, 0, 0, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0],
[1, 0, 1, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 0, 1]])
for comparison:
y = np.random.choice([1, 2, 3], size=3000)
def mat_me_py(y):
return (y == y.reshape([-1, 1])).astype(int)
%timeit mat_me_py(y)
# 28.6 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
vs.
y = np.random.choice([1, 2, 3], size=3000)
y = list(y)
def mat_me_py(y):
return [[int(a == b) for a in y] for b in y]
%timeit mat_me_py(y)
# 4.16 s ± 213 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The difference will become very pronounced on larger lists.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.