I have a NumPy array of integers:
x = np.array([1, 0, 2, 1, 4, 1, 4, 1, 0, 1, 4, 3, 0, 1, 0, 2, 1, 4, 3, 1, 4, 1, 0])
and another array of indices that references the array above:
indices = np.array([22, 12, 8, 1, 14, 21, 7, 0, 13, 19, 5, 3, 9, 16, 2, 15, 11, 18, 20, 6, 4, 10, 17])
For every pair of neighboring indices, we need to count how many consecutive values in x
are overlapping starting at each of the two neighboring indices. For example, for indices[2]
and indices[3]
, we have index 8
and 1
, respectively, and they both reference positions in x
. Then, starting at x[8]
and x[1]
, we count how many consecutive values are the same or are overlapping but we stop checking the overlap under specific conditions (see below). In other words, we check if:
x[8] == x[1]
x[9] == x[2]
# increment each index by one i >= x.shape[0]
j >= x.shape[0]
6. stop if x[i] == 0
7. stop if x[j] == 0
x[i] != x[j]
In reality, we do this for all neighboring index pairs:
out = np.zeros(indices.shape[0], dtype=int)
for idx in range(indices.shape[0]-1):
count = 0
i = indices[idx]
j = indices[idx + 1]
k = 0
# while i+k < x.shape[0] and j+k < x.shape[0] and x[i+k] != 0 and x[j+k] != 0 and x[i+k] == x[j+k]:
while i+k < x.shape[0] and j+k < x.shape[0] and x[i+k] == x[j+k]:
count += 1
k += 1
out[idx] = k
And the output is:
# [0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 2, 3, 0, 3, 0, 1, 0, 2, 2, 1, 2, 0] # This is the old output if x[i] == 0 and x[j] == 0 are included
[1 2 1 4 0 2 2 5 1 4 3 2 3 0 3 0 1 0 3 2 1 2 0]
I'm looking for a vectorized way to do this in NumPy.
This should do the trick (I am ignoring the two conditions x[i]=0
and x[j]=0
)
for idx in range(indices.shape[0]-1):
i = indices[idx]
j = indices[idx + 1]
l = len(x) - max(i,j)
x1 = x[i:i+l]
x2 = x[j:j+l]
# Add False at the end to handle the case in which arrays are exactly the same
x0 = np.append(x1==x2, False)
out[idx] = np.argmin(x0)
Notice that with np.argmin
I am exploiting the following two facts:
False < True
np.argmin
only returns the first instance of the min in the array Regarding time performance, I tested with N=10**5
and N=10**6
, and as suggested in the comments, this cannot compete with numba jit.
def f(x, indices):
out = np.zeros(indices.shape[0], dtype=int)
for idx in range(indices.shape[0]-1):
i = indices[idx]
j = indices[idx + 1]
l = len(x) - max(i,j)
x1 = x[i:i+l]
x2 = x[j:j+l]
x0 = np.append(x1==x2, False)
out[idx] = np.argmin(x0)
return out
N=100_000
x = np.random.randint(0,10, N)
indices = np.arange(0, N)
np.random.shuffle(indices)
%timeit f(x, indices)
3.67 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
N=1_000_000
x = np.random.randint(0,10, N)
indices = np.arange(0, N)
np.random.shuffle(indices)
%time f(x, indices)
Wall time: 8min 20s
(I did not have the patience to let %timeit
finish)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.