Accumulate rows of NumPy array based on the last column

Question

I have the following question. I have an array with coordinate arrays in it, the first three entries are the x,y,z coordinate, the 4th entry is the id of the track. I want to add a drift to the tracks, which starts after the first time point. Is there a simple approach of adding the drift dynamically to the tracks with their ids, which could have different length, instantly to the whole array? (So as you can see, the track with id has only 3 coordinate entries, and track with id 3 has 6)

import numpy as np
drift=np.array([1,1,0])
a = np.array([[1,1,1,0],[1,1,1,0],[1,1,1,0],
              [1,1,1,2],[1,1,1,2],[1,1,1,3],
              [1,1,1,3],[1,1,1,3],[1,1,1,3],
              [1,1,1,3],[1,1,1,3]])

Output:

output = np.array([[1,1,1,0],[2,2,1,0],[3,3,1,0],
                   [1,1,1,2],[2,2,1,2],[1,1,1,3],
                   [2,2,1,3],[3,3,1,3],[4,4,1,3],
                   [5,5,1,3],[6,6,1,3]])

Answer 1

Here is an example of how it can be done in a vectorized manner:

import numpy as np


drift = np.array([1, 1, 0])
a = np.array([[1, 1, 1, 0], [1, 1, 1, 0], [1, 1, 1, 0], [1, 1, 1, 2], 
              [1, 1, 1, 2], [1, 1, 1, 3], [1, 1, 1, 3], [1, 1, 1, 3], 
              [1, 1, 1, 3], [1, 1, 1, 3], [1, 1, 1, 3]])


def multirange(counts: np.ndarray) -> np.ndarray:
    """
    Calculates concatenated ranges. Code was taken at:
    https://stackoverflow.com/questions/20027936/how-to-efficiently-concatenate-many-arange-calls-in-numpy
    """
    counts = counts[counts != 0]
    counts1 = counts[:-1]
    reset_index = np.cumsum(counts1)
    incr = np.ones(counts.sum(), dtype=int)
    incr[0] = 0
    incr[reset_index] = 1 - counts1
    incr.cumsum(out=incr)
    return incr


def drifts(ids: np.ndarray,
           drift: np.ndarray) -> np.ndarray:
    diffs = np.diff(ids)
    max_drifts_per_id = np.concatenate((np.where(diffs)[0], [len(ids) - 1])) + 1
    max_drifts_per_id[1:] = max_drifts_per_id[1:] - max_drifts_per_id[:-1]
    multipliers = multirange(max_drifts_per_id)
    drifts = np.tile(drift, (len(ids), 1))
    return drifts * multipliers[:, np.newaxis]


a[:, :-1] += drifts(a[:, -1], drift)
print(a)

Output:

array([[0, 0, 0, 0],
       [1, 1, 0, 0],
       [2, 2, 0, 0],
       [0, 0, 0, 2],
       [1, 1, 0, 2],
       [0, 0, 0, 3],
       [1, 1, 0, 3],
       [2, 2, 0, 3],
       [3, 3, 0, 3],
       [4, 4, 0, 3],
       [5, 5, 0, 3]])

Explanation :

The idea of the drifts function is to take an array of ids (which in our case we can obtain as a[:, -1] : array([0, 0, 0, 2, 2, 3, 3, 3, 3, 3, 3]) ) and drift ( np.array([1, 1, 0]) ) to get the following array which then can be appended to the original array:

array([[0, 0, 0],
       [1, 1, 0],
       [2, 2, 0],
       [0, 0, 0],
       [1, 1, 0],
       [0, 0, 0],
       [1, 1, 0],
       [2, 2, 0],
       [3, 3, 0],
       [4, 4, 0],
       [5, 5, 0]])

Line by line:

diffs = np.diff(ids)

Here we get an array where all non-zero elements will have indices of the last ids in the first array:

array([0, 0, 2, 0, 1, 0, 0, 0, 0, 0])

See np.diff for details.

max_drifts_per_id = np.concatenate((np.where(diffs)[0], [len(ids) - 1])) + 1

np.where(diffs)[0] will give indices of those non-zero elements from the previous array. We append index of the last element and increment the resulting indices by 1 in order to get ranges later. See np.where for details. After concatenation max_drifts_per_id will be:

array([ 3,  5, 11])

max_drifts_per_id[1:] = max_drifts_per_id[1:] - max_drifts_per_id[:-1]

Here from the previous result we get an array of end values of ranges:

array([3, 2, 6])

multipliers = multirange(max_drifts_per_id)

We use multirange as an efficient alternative to concatenating calls of np.arange . See How to efficiently concatenate many arange calls in numpy? for details. Resulting multipliers will be:

array([0, 1, 2, 0, 1, 0, 1, 2, 3, 4, 5])

drifts = np.tile(drift, (len(ids), 1))

By np.tile we expand the drift to have the same number of rows as ids :

array([[1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0],
       [1, 1, 0]])

return drifts * multipliers[:, np.newaxis]

We multiply it by multipliers and get:

array([[0, 0, 0],
       [1, 1, 0],
       [2, 2, 0],
       [0, 0, 0],
       [1, 1, 0],
       [0, 0, 0],
       [1, 1, 0],
       [2, 2, 0],
       [3, 3, 0],
       [4, 4, 0],
       [5, 5, 0]])

And finally this returned value can be added to the original array:

a[:, :-1] += drifts(a[:, -1], drift)

Answer 2

There is no builtin way of doing this as far as I know, but you can solve it with this simple loop:

import numpy as np
drift=np.array([1,1,0])
a = np.array([[1,1,1,0],[1,1,1,0],[1,1,1,0],
[1,1,1,2],[1,1,1,2],[1,1,1,3],[1,1,1,3],[1,1,1,3],[1,1,1,3],[1,1,1,3],[1,1,1,3]])

_id = 0
n = 0
for i in range(a.shape[0]):
    if a[i, 3] == _id:
        a[i, 0:3] = a[i, 0:3] + n * drift
        n += 1
    else:
        _id = a[i, 3]
        n = 1

print(a)

Accumulate rows of NumPy array based on the last column

Question

2 answers

solution1
2 ACCPTED 2018-04-03 16:31:44

solution2
0 2018-04-03 13:57:55

Accumulate rows of NumPy array based on the last column

Question

2 answers

solution1 2 ACCPTED 2018-04-03 16:31:44

solution2 0 2018-04-03 13:57:55

solution1
2 ACCPTED 2018-04-03 16:31:44

solution2
0 2018-04-03 13:57:55