简体   繁体   中英

Sum 2D Numpy Array by Multiple Labels

Thanks for answering my questions. Here is my 3rd one.

  1. Each element of the data array is a coordinate (x, y).
  2. Each coordinate has 2 labels
  3. Goal: sum the elements that have the same two labels.

For example, if the inputs are

data = numpy.array( [ [1, 2], [3,8], [4,5], [2,9], [1, 3], [7, 2] ] )
label1 = numpy.array([0,0,1,1,2,2])
label2 = numpy.array([0,1,0,0,1,1])

should give:

array([[[ 1 ,  2 ],
        [ 3 ,  8 ]],

       [[ 6 , 14 ],
        [ 0 ,  0 ]],

       [[ 0 ,  0 ],
        [ 8 ,  5 ]]])

Here is my current code:

import numpy
import ndimage from scipy

data = numpy.array( [ [1, 2], [3,8], [4,5], [2,9], [1, 3], [7, 2] ] )
label1 = numpy.array([0,0,1,1,2,2])
label2 = numpy.array([0,1,0,0,1,1])

kinds_of_label1 = 3
kinds_of_label2 = 2

label1_l = label1.size
label2_l = label2.size

label12 = label1 * 2 + label2
kinds12_range = range(kinds_of_label1 * kinds_of_label2 )

result = numpy.zeros( (num_frame, num_cluster, 2) )
result_T = result.view().reshape( (num_frame * num_cluster, 2) ).T
result_T[0] = ndimage.measurements.sum( data.T[0], label12, index = kinds12_range )
result_T[1] = ndimage.measurements.sum( data.T[1], label12, index = kinds12_range )
counting = numpy.bincount(label12)

print(result)
print(counting)

This works, but summing the x and y coordinate separately (as in the result_T[0] and result_T[1] ) seem bad. Moreover, ndimage.measurements.sum give floating point answer. Integer arithmetic is faster.

Can we make this faster and better?

#### Wrong Answer.  Do Not Use. ####
import numpy
### Input ###
label1 = numpy.array([0,0,1,1,2,2])
kinds_of_label1 = 3

label2 = numpy.array([0,1,0,0,1,1])
kinds_of_label2 = 2

data = numpy.array( [ [1, 2], [3,8], [4,5], [2,9], [1, 3], [7, 2] ] )

### Processing ####
# this assumes label1 and label2 starts are like 0, 1, 2, 3 ...
label1_and_2 = label1*kinds_of_label2 + label2

result = numpy.zeros( (kinds_of_label1 * kinds_of_label2, 2) )
result[ label1_and_2 ] += data

counting = numpy.bincount( label1_and_2 )

### output ###
print( result.view().reshape(kinds_of_label1, kinds_of_label2, 2) )


>>> array([[[ 1.,  2.],
            [ 3.,  8.]],

           [[ 2.,  9.],
            [ 0.,  0.]],

           [[ 0.,  0.],
            [ 7.,  2.]]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM