[英]Fastest way to iterate 2 arrays and perform operations
I have a function which takes as input 2 arrays of zeros and ones ~8000 elements per array.我有一个 function 作为输入 2 arrays 的零和一个 ~ 每个数组 8000 个元素。 My function eps calculates a statistic on these arrays and returns the output. It is easy operations just checking for 0 and noting the index where 0 is found in array.
我的 function eps 计算这些 arrays 的统计数据并返回 output。只需检查 0 并注意在数组中找到 0 的索引即可,操作很简单。 I tried my best to optimize for speed but the best I could get is 4.5 ~5 seconds (for 18k array pairs) using timeit library.
我尽我最大的努力来优化速度,但我能得到的最好的是使用 timeit 库的 4.5 ~ 5 秒(对于 18k 数组对)。 Time is important as I need to run this function on billions of array pairs.
时间很重要,因为我需要在数十亿个数组对上运行这个 function。
#e.g. inputs
#ts_1 = [0,1,1,0,0,1,1,0,......]
#ts_2 = [1,1,1,1,1,1,1,0,......]
# tau = any integer or float
def eps(ts_1, ts_2, tau):
n1 = 0
n2 = 0
Q_tau = 0
q_tau = 0
event_index1 = [index for index, item in enumerate(ts_1) if item == 0]
n1 = ts_1.count(0)
event_index2 = [index for index, item in enumerate(ts_2) if item == 0]
n2 = ts_2.count(0)
# tried numpy based on @Ram comment below, no improvement
event_index1, = np.where(np.array(ts_1) == 0)
n1 = event_index1.shape[0]
event_index2, = np.where(np.array(ts_2) == 0)
n2 = event_index2.shape[0]
# tried numpy based on @Ram comment below, no improvement
if (n1 == 0 or n2 == 0):
Q_tau = 0
else:
c_ij = 0
matching_idx = set(event_index1).intersection(event_index2)
c_ij = c_ij + (0.5 *len(matching_idx) )
for x,y in product(event_index1,event_index2):
if x-y > 0 and (x-y)<= tau:
c_ij = c_ij +1
c_ji = 0
matching_idx_2 = set(event_index2).intersection(event_index1)
c_ji = c_ji + (0.5 *len(matching_idx_2) )
for x,y in product(event_index2,event_index1):
if x-y > 0 and (x-y)<= tau:
c_ji = c_ji +1
Q_tau = (c_ij+c_ji)/math.sqrt( n1 * n2 )
q_tau = (c_ij - c_ji)/math.sqrt( n1 * n2 )
return Q_tau, q_tau
Based on my comments earlier, and considering that permuting two lists in a product will give you the same tuples just inverted, you could reduce your code to:根据我之前的评论,考虑到排列产品中的两个列表会给你相同的元组,你可以将代码减少到:
def eps(ts_1, ts_2, tau):
Q_tau = 0
q_tau = 0
event_index1 = [index for index, item in enumerate(ts_1) if item == 0]
n1 = len(event_index1)
event_index2 = [index for index, item in enumerate(ts_2) if item == 0]
n2 = len(event_index2)
if (n1 != 0 and n2 != 0):
matching_idx = set(event_index1).intersection(event_index2)
c_ij = c_ji = 0.5 *len(matching_idx)
for x,y in product(event_index1,event_index2):
if x-y > 0 and (x-y)<= tau:
c_ij += 1
elif y-x > 0 and (y-x) <= tau:
c_ji += 1
Q_tau = (c_ij+c_ji)/math.sqrt( n1 * n2 )
q_tau = (c_ij - c_ji)/math.sqrt( n1 * n2 )
return Q_tau, q_tau
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.