简体   繁体   中英

apply_along_axis with 2 arguments varying for each 1d slice

I'm trying to optimize code that currently uses nested for loops & calls scipy's functions.

Basically, I have a first function that calls scipy's find_peaks() methods, and then I want to interpolate those data points (the peak) to find a function that describes them. For example, I first find the peak. This basically is a 2D array of dimension 25*30 (axis 0) with 1000 elements in each (axis 1).

arr = np.random.rand(25,30,1000)
arr = arr.reshape((arr.shape[0]*arr.shape[1], arr.shape[2]))
# we have a 25*30 set of 1000 pts each. find peaks for that
peaks = np.apply_along_axis(find_peaks, 1, arr, height=0,)

Find peaks returns something of the form:

peak_indices = peaks[:,0]
peak_values = peaks[:,1]["peak_heights"]

So far so good. That's essentially the (x,y) coordinates of the points I want to interpolate.

Now, I want to interpolate those couples of indices-heights values to obtain some function, using scipy.interpolate.interpolate.interp1d(...). Interp1d's signature is of the form:

interp1d(x, y, kind='linear', axis=-1, copy=True, bounds_error=None, fill_value=nan, assume_sorted=False)

Where x would be my peak_indices, and y my peak_values.

The question:

How can I pass to this function 2 arguments that vary with each slice? Eg in other words, my first use of apply_along_axis only used a single slice-dependant argument (the 1000 points for each of my 25*30 elements of axis 0). However here I need to pass to the function TWO arguments - the peak_indices & the peak_values. Can any pythonista think of a clever way to unpack those arguments AFTER I pass them to apply_along_axis as tuples or something? Kind of:

arr=*[peak_indices, peak_values] 

I cannot really edit the interp1D function itself, which would be my solution if I was going to call my own function...

EDIT: part of the benefits of using apply along axis is that I should get performance improvements compared to nested ifs, since numpy should be able to bulk-process those calculation. Ideally any solution should use a notation that will still allow those optimisation.

Where do you get the idea that apply_along_axis is a performance tool? Does it actually work faster in this case?

arr = np.random.rand(25,30,1000)
arr = arr.reshape((arr.shape[0]*arr.shape[1], arr.shape[2]))
# we have a 25*30 set of 1000 pts each. find peaks for that
peaks = np.apply_along_axis(find_peaks, 1, arr, height=0,)

compared to:

peaks = np.array([find_peaks(x, height=0) for x in arr])

That is a simple iteration over the 25*30 set of 1d arrays.

apply does a test calculation to determine the return shape and dtype. It constructs are result array, and then iterates on all axes except 1, and calls the function with that 1d array. There's no compiling, or "bulk processing" (what ever that is). It just hides a loop in a function call.

It does make iteration over 2 axes of a 3d array prettier, but not faster:

You could have used it on the original arr , to get (25,30,2) result:

peaks = np.apply_along_axis(find_peaks, 2, arr_3d, height=0,)

I'm guessing find_peaks returns a 2 element tuple of values, and peaks will then be an object dtype array.

Since apply_along_axis does not have any performance advantages, I don't see the point to trying to use it with a more complex array. It's handy when you have a 3d array, and a function that takes a 1d input, but beyond that ....?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM