简体   繁体   中英

Numpy: How to calculate sums of array slices using indeces?

I have a matrix M:

M = [[10, 1000],
 [11, 200],
 [15, 800],
 [20, 5000],
 [28, 100],
 [32, 3000],
 [35, 3500],
 [38, 100],
 [50, 5000],
 [51, 100],
 [55, 2000],
 [58, 3000],
 [66, 4000],
 [90, 5000]]

And a matrix R:

 [[10 20]
  [32 35]
  [50 66]
  [90 90]]

I want to use the values in column 0 of matrix R as start value of a slice and the value in column 1 as end of a slice.

I want to calculate the sum between and including the ranges of these slices from the right column in matrix M.

Basically doing

  M[0:4][:,1].sum() # Upper index +1 as I need upper bound including
  M[5:7][:,1].sum() # Upper index +1 as I need upper bound including

and so on. 0 is the index of 10 and 3 is the index of 20. 5 would be the index of 32, 6 the index of 35.

I'm stuck at how to get the start/end values from matrix R into indeces by column 0 of matrix M. And then calculate the sum between the index range including upper/lower bound.

Expected output:

[[10, 20, 7000], # 7000 = 1000+200+800+5000
 [32, 35, 6500], # 6500 = 3000+3500
 [50, 66, 14100], # 14100 = 5000+100+2000+3000+4000
 [90, 90, 5000]] # 5000 = just 5000 as upper=lower boundary

Update, I can get the indices now using searchsorted. Now I just need to use sum at column 1 of matrix M within the start and end.

 start_indices = [0,5,8,13]
 end_indices = [3,6,12,13]

Wondering if there is a more efficient way than applying a for loop?

EDIT: Found the answer here. Numpy sum of values in subarrays between pairs of indices

Use searchsorted to determine the correct indices and add.reduceat to perform the summation:

>>> idx = M[:, 0].searchsorted(R) + (0, 1)
>>> idx = idx.ravel()[:-1] if idx[-1, 1] == M.shape[0] else idx.ravel()
>>> result = np.add.reduceat(M[:, 1], idx)[::2]
>>> result
array([ 7000,  6500, 14100,  5000])

Details:

Since you want to include the upper boundaries but Python excludes them we have to add 1.

reduceat cannot handle len(arg0) as an index, we have to special case that

reduceat computes all stretches between consecutive boundaries, we have to discard every other one

I think it would be better to show an example of the output you are expecting. If what you want to calculate using M[0:4][:,1].sum() is the sum of 1000 + 200 + 800 + 5000. Then this code might help:

import numpy as np

M = np.matrix([[10, 1000],
 [11, 200],
 [15, 800],
 [20, 5000],
 [28, 100],
 [32, 3000],
 [35, 3500],
 [38, 100],
 [50, 5000],
 [51, 100],
 [55, 2000],
 [58, 3000],
 [66, 4000],
 [90, 5000]])


print(M[0:4][:,1].sum())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM