[英]Vectorizing for loops in python with numpy multidimensional arrays
I'm trying to improve the performance of this code below. 我正在尝试改善以下代码的性能。 Eventually it will be using much bigger arrays but I thought I would start of with something simple that works then look at where is is slow, optimise it then try it out on the full size.
最终它将使用更大的数组,但我想我将从简单的可行方法开始,然后查看缓慢的地方,对其进行优化,然后在完整大小上进行尝试。 Here is the original code:
这是原始代码:
#Minimum example with random variables
import numpy as np
import matplotlib.pyplot as plt
n=4
# Theoretical Travel Time to each station
ttable=np.array([1,2,3,4])
# Seismic traces,measured at each station
traces=np.random.random((n, 506))
dt=0.1
# Forward Problem add energy to each trace at the deserired time from a given origin time
given_origin_time=1
for i in range(n):
# Energy will arrive at the sample equivelant to origin time + travel time
arrival_sample=int(round((given_origin_time+ttable[i])/dt))
traces[i,arrival_sample]=2
# The aim is to find the origin time by trying each possible origin time and adding the energy up.
# Where this "Stack" is highest is likely to be the origin time
# Find the maximum travel time
tmax=ttable.max()
# We pad the traces to avoid when we shift by a travel time that the trace has no value
traces=np.lib.pad(traces,((0,0),(round(tmax/dt),round(tmax/dt))),'constant',constant_values=0)
#Available origin times to search for relative to the beginning of the trace
origin_times=np.linspace(-tmax,len(traces),len(traces)+round(tmax/dt))
# Create an empty array to fill with our stack
S=np.empty((origin_times.shape[0]))
# Loop over all the potential origin times
for l,otime in enumerate(origin_times):
# Create some variables which we will sum up over all stations
sum_point=0
sqrr_sum_point=0
# Loop over each station
for m in range(n):
# Find the appropriate travel time
ttime=ttable[m]
# Grap the point on the trace that corresponds to this travel time + the origin time we are searching for
point=traces[m,int(round((tmax+otime+ttime)/dt))]
# Sum up the points
sum_point+=point
# Sum of the square of the points
sqrr_sum_point+=point**2
# Create the stack by taking the square of the sums dived by sum of the squares normalised by the number of stations
S[l]=sum_point#**2/(n*sqrr_sum_point)
# Plot the output the peak should be at given_origin_time
plt.plot(origin_times,S)
plt.show()
I think the problem i dont understand the broacasting and indexing of multidimensional arrays. 我认为我不了解多维数组的广播和索引问题。 After this I will be extended the dimensions to search for x,y,z which would be given by increaseing the dimension ttable.
此后,我将扩展尺寸以搜索x,y,z,这可以通过增加尺寸ttable来获得。 I will probably try and implement either pytables or np.memmap to help with the large arrays.
我可能会尝试实现pytables或np.memmap来帮助处理大型数组。
With some quick profiling, it appears that the line 经过一些快速分析,似乎该行
point=traces[m,int(round((tmax+otime+ttime)/dt))]
is taking ~40% of the total program's runtime. 占用了整个程序运行时间的40%。 Let's see if we can vectorize it a bit:
让我们看看是否可以对其向量化:
ttime_inds = np.around((tmax + otime + ttable) / dt).astype(int)
# Loop over each station
for m in range(n):
# Grap the point on the trace that corresponds to this travel time + the origin time we are searching for
point=traces[m,ttime_inds[m]]
We noticed that the only thing changing in the loop (other than m
) was ttime
, so we pulled it out and vectorized that part using numpy functions. 我们注意到,循环中唯一发生变化的(不是
m
)是ttime
,因此我们将其取出并使用numpy函数矢量化了该部分。
That was the biggest hotspot, but we can go a bit further and remove the inner loop entirely: 那是最大的热点,但是我们可以更进一步,彻底删除内部循环:
# Loop over all the potential origin times
for l,otime in enumerate(origin_times):
ttime_inds = np.around((tmax + otime + ttable) / dt).astype(int)
points = traces[np.arange(n),ttime_inds]
sum_point = points.sum()
sqrr_sum_point = (points**2).sum()
# Create the stack by taking the square of the sums dived by sum of the squares normalised by the number of stations
S[l]=sum_point#**2/(n*sqrr_sum_point)
EDIT : If you want to remove the outer loop as well, we need to pull otime
out: 编辑 :如果您也想删除外部循环,我们需要将
otime
退出:
ttime_inds = np.around((tmax + origin_times[:,None] + ttable) / dt).astype(int)
Then, we proceed as before, summing over the second axis: 然后,我们像以前一样继续进行第二轴的求和:
points = traces[np.arange(n),ttime_inds]
sum_points = points.sum(axis=1)
sqrr_sum_points = (points**2).sum(axis=1)
S = sum_points # **2/(n*sqrr_sum_points)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.