[英]Is there any option to improve the time efficiency of the this data normalization any further?

I have a matrix named tArray with shape (11, 512) and want to normalize the values in it.我有一个名为 tArray 的矩阵,其形状为 (11, 512) 并且想要对其中的值进行归一化。 I see that the np.max() costs a lot time but I didn't find any option to improve it any further.我看到 np.max() 花费了很多时间,但我没有找到任何进一步改进它的选项。 Can the time efficiency of this following line of code be improved?:下面这行代码的时间效率可以提高吗?:

tArray = np.array([[val/tArray[i][sqLen-1] for val in tArray[i]] if i not in [1,2] else [val/np.max(tArray[i][:sqLen-1]) for val in tArray[i]] for i in range(len(tArray))])

to reproduce:重现:

tArray = np.random.randint(1, 100, size=(11, 512))
tArray = np.array([[val/tArray[i][512-1] for val in tArray[i]] if i not in [1,2] else [val/np.max(tArray[i][:512-1]) for val in tArray[i]] for i in range(len(tArray))])```

Here is ~ 180X speedup improvement approach:这是〜180X加速改进方法:

Note, for the shape of your input array [512-1] is the same as [-1] (last column) and [:512-1] is the same as [:-1] .请注意,对于输入数组的形状[512-1][-1] (最后一列)相同, [:512-1][:-1]相同。

The main condition of your loop if i not in [1,2] else tells that the aggregations/calculations are implied for exactly 3 slices: [0] (first row), [1:3] (rows 1 and 2) and the remaining rows [3:] .循环的主要条件if i not in [1,2] else表明聚合/计算隐含在 3 个切片中: [0] (第一行)、 [1:3] (第 1 行和第 2 行)和剩余行[3:]

So instead of iterating over each row and recalculating each column we can apply the needed operations for 3 sequential slices at once in vectorized manner and eventually concatenate the results with np.vstack routine:因此,不是遍历每一行并重新计算每一列,我们可以以矢量化方式一次对 3 个连续切片应用所需的操作,并最终将结果与np.vstack例程连接起来:

np.vstack((tArray[0]/tArray[0,-1], tArray[1:3]/tArray[1:3,:-1].max(1)[:,None], tArray[3:]/tArray[3:,-1][:,None]))

Let's see on measurements:让我们看看测量:

tArray = np.random.randint(1, 100, size=(11, 512)) # input array

In [165]: %timeit tArray1 = np.array([[val/tArray[i][512-1] for val in tArray[i]] if i not in [1,2] else [val/np.max
     ...: (tArray[i][:512-1]) for val in tArray[i]] for i in range(len(tArray))])
4.54 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [171]: %timeit new_arr = np.vstack((tArray[0]/tArray[0,-1], tArray[1:3]/tArray[1:3,:-1].max(1)[:,None], tArray[3:]/tAr
     ...: ray[3:,-1][:,None]))
25.5 µs ± 264 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Of course, tArray1 and new_arr have the same content:当然, tArray1new_arr的内容是一样的:

In [173]: tArray1
array([[ 8.11111111,  2.33333333,  9.33333333, ...,  0.44444444,
         5.22222222,  1.        ],
       [ 0.76767677,  0.77777778,  0.72727273, ...,  0.58585859,
         0.29292929,  0.09090909],
       [ 0.36363636,  0.85858586,  0.35353535, ...,  0.06060606,
         0.48484848,  0.55555556],
       [ 1.875     ,  2.04166667,  0.29166667, ...,  0.20833333,
         0.58333333,  1.        ],
       [ 0.28735632,  0.11494253,  0.37931034, ...,  0.50574713,
         0.74712644,  1.        ],
       [ 5.625     , 10.5       ,  0.5       , ...,  2.125     ,
         0.75      ,  1.        ]])

In [174]: new_arr
array([[ 8.11111111,  2.33333333,  9.33333333, ...,  0.44444444,
         5.22222222,  1.        ],
       [ 0.76767677,  0.77777778,  0.72727273, ...,  0.58585859,
         0.29292929,  0.09090909],
       [ 0.36363636,  0.85858586,  0.35353535, ...,  0.06060606,
         0.48484848,  0.55555556],
       [ 1.875     ,  2.04166667,  0.29166667, ...,  0.20833333,
         0.58333333,  1.        ],
       [ 0.28735632,  0.11494253,  0.37931034, ...,  0.50574713,
         0.74712644,  1.        ],
       [ 5.625     , 10.5       ,  0.5       , ...,  2.125     ,
         0.75      ,  1.        ]])

Create an array of denominators, replacing the ones in the selected rows with the max.创建一个分母数组,用最大值替换所选行中的分母。 Then divide the whole matrix by this array of denominators (you need to transpose the matrix to do this then transpose it back again).然后将整个矩阵除以这个分母数组(您需要转置矩阵来执行此操作,然后再将其转回)。

t = np.random.randint(1, 100, size=(11, 512))
ignore = [1, 2] 
denoms = t[..., -1].copy()
denoms[ignore] = t[ignore, :-1].max(axis=1)
result = (t.T / denoms).T

This seems to be slightly faster than the vstack solution and also allows you to choose which rows to select a bit more cleanly.这似乎比vstack解决方案稍微快一些,并且还允许您更清楚地选择哪些行到 select。

How about this for a 350x speedup (560x on floats)?这对于 350 倍的加速(浮点数 560 倍)怎么样?

def f(a):
    d = a[:, -1].copy()
    d[1:3] = a[1:3, :-1].max(1)
    return a / d[:, None]

On float arrays, it's twice faster than @Roman's answer.float arrays 上,它比@Roman 的答案快两倍。 I would argue that it is also a bit easier to read.我认为它也更容易阅读。

a = np.random.uniform(1, 100, size=(11, 512))

%timeit np.vstack((a[0]/a[0,-1], a[1:3,:]/a[1:3,:-1].max(), a[3:,:]/a[3:,-1][:,None]))
24.4 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%timeit f(a)
11.8 µs ± 22.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

On int arrays, the difference is a bit less drastic (60% faster).int arrays 上,差别不大(快 60%)。

