简体   繁体   中英

Summing and removing repeated elements of Numpy Arrays

I have 4 1D Numpy arrays of equal length. The first three act as an ID, uniquely identifying the 4th array.

The ID arrays contain repeated combinations, for which I need to sum the 4th array, and remove the repeating element from all 4 arrays.

x = np.array([1, 2, 4, 1])
y = np.array([1, 1, 4, 1])
z = np.array([1, 2, 2, 1])
data = np.array([4, 7, 3, 2])

In this case I need:

x = [1, 2, 4]
y = [1, 1, 4]
z = [1, 2, 2]
data = [6, 7, 3]

The arrays are rather long so loops really won't work. I'm sure there is a fairly simple way to do this, but for the life of me I can't figure it out.

To get started, we can stack the ID vectors into a matrix such that each ID is a row of three values:

XYZ = np.vstack((x,y,z)).T

Now, we just need to find the indices of repeated rows. Unfortunately, np.unique doesn't operate on rows, so we need to do some tricks :

order = np.lexsort(XYZ.T)
diff = np.diff(XYZ[order], axis=0)
uniq_mask = np.append(True, (diff != 0).any(axis=1))

This part is borrowed from the np.unique source code , and finds the unique indices as well as the "inverse index" mapping:

uniq_inds = order[uniq_mask]
inv_idx = np.zeros_like(order)
inv_idx[order] = np.cumsum(uniq_mask) - 1

Finally, sum over the unique indices:

data = np.bincount(inv_idx, weights=data)
x,y,z = XYZ[uniq_inds].T

You can use unique and sum as reptilicus suggested to do the following

from itertools import izip
import numpy as np

x = np.array([1, 2, 4, 1])
y = np.array([1, 1, 4, 1])
z = np.array([1, 2, 2, 1])
data = np.array([4, 7, 3, 2])

# N = len(x)
# ids = x + y*N + z*(N**2)
ids = np.array([hash((a, b, c)) for a, b, c in izip(x, y, z)]) # creates flat ids

_, idx, idx_rep = np.unique(ids, return_index=True, return_inverse=True)

x_out = x[idx]
y_out = y[idx]
z_out = z[idx]
# data_out = np.array([np.sum(data[idx_rep == i]) for i in idx])
data_out = np.bincount(idx_rep, weights=data)

print x_out
print y_out
print z_out
print data_out

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM