繁体   English   中英


[英]Performance drop in NumPy matrix-vector multiplication



import timeit
for i in range(90, 101):
    tm = timeit.repeat('np.matmul(a, b)', number = 10000,
        setup = 'import numpy as np; a, b = np.random.rand({0},{0}), np.random.rand({0})'.format(i))
    print(i, sum(tm) / 5)


90 0.08936462279998522
91 0.08872119059979014
92 0.09083068459967762
93 0.09311594780047017
94 0.09907015420012613
95 0.10136517100036144
96 0.10339414420013782
97 0.10627872140012187
98 0.1102267580001353
99 0.11277738099979615
100 0.11471197419996315


90 0.03618830284103751
91 0.03737151022069156
92 0.03295294055715203
93 0.02851409767754376
94 0.02677299762144685
95 0.028137388220056892
96 0.1916038074065
97 0.16719966367818415
98 0.18511182265356182
99 0.1806833743583411
100 0.17172936061397195


90 0.04183819475583732
91 0.029678784403949977
92 0.02486871089786291
93 0.02882006801664829
94 0.028613184532150625
95 0.02956576123833656
96 31.16711748293601
97 27.803299666382372
98 31.368976181373
99 27.71114011341706
100 26.219610543036833

Python / NumPy版本在我测试的所有机器上都是一样的(3.7.2 / 1.16.2)。 操作系统也是一样的(Arch Linux)。

可能的原因是什么? 为什么这个发生在96号?

在96,您的测试达到一些软件/硬件问题:96 * 96 * 96 = 884,736。 接近1M并乘以8字节浮点数:7,077,888。 Intel i5处理器具有6 MB L3缓存。 我的iMac有这种类型的处理器,并且在96尺寸下有这个减速问题。 英特尔®酷睿™i5-7200U处理器具有3 MB三级高速缓存,没有此问题。 因此,可能是软件算法无法正确使用6 MB缓存大小。


  1. 在Python版本3.8.0a2(当前预发布测试版本)中修复此问题
  2. 在Windows和macOS上的Python v 3.7.2(最新版本)中存在问题。

我写了一个更长的程序来测试我的寡妇和macOS计算机。 看起来版本3.7中的NumPy开始在我的计算机上的所有四个逻辑处理器中运行matmul功能。 我在3.8.02a中没有看到这个:

$ python3.8 numpy_matmul.py       $ python3.7 numpy_matmul.py     

Python version  : 3.8.0a2         Python version  : 3.7.2         
  build:('v3.8.0a2:23f4589b4b',    build:('v3.7.2:9a3ffc0492',
        Feb 25 2019 10:59:08')          'Dec 24 2018 02:44:43')
  compiler:                        compiler:
     Clang 6.0 (clang-600.0.57)   Clang 6.0 (clang-600.0.57) 

Tested by Python code only :      Tested by Python code only :  
 90 time = 0.1132 cpu = 0.1100     90 time = 0.1535 cpu = 0.1236
 91 time = 0.1133 cpu = 0.1130     91 time = 0.1264 cpu = 0.1263
 92 time = 0.1079 cpu = 0.1077     92 time = 0.1089 cpu = 0.1087
 93 time = 0.1146 cpu = 0.1145     93 time = 0.1226 cpu = 0.1224
 94 time = 0.1176 cpu = 0.1174     94 time = 0.1273 cpu = 0.1271
 95 time = 0.1216 cpu = 0.1215     95 time = 0.1372 cpu = 0.1371
 96 time = 0.1115 cpu = 0.1114     96 time = 0.2854 cpu = 0.8933
 97 time = 0.1231 cpu = 0.1229     97 time = 0.2887 cpu = 0.9033
 98 time = 0.1174 cpu = 0.1173     98 time = 0.2836 cpu = 0.8963
 99 time = 0.1330 cpu = 0.1301     99 time = 0.3100 cpu = 0.9108
100 time = 0.1130 cpu = 0.1128    100 time = 0.3149 cpu = 0.9087

Tested with timeit.repeat :       Tested with timeit.repeat :   
 90 time = 0.1060 cpu = 0.1066     90 time = 0.1238 cpu = 0.3264
 91 time = 0.1091 cpu = 0.1097     91 time = 0.1233 cpu = 0.1240
 92 time = 0.1021 cpu = 0.1027     92 time = 0.1138 cpu = 0.1128
 93 time = 0.1149 cpu = 0.1156     93 time = 0.1324 cpu = 0.1327
 94 time = 0.1135 cpu = 0.1139     94 time = 0.1319 cpu = 0.1326
 95 time = 0.1170 cpu = 0.1177     95 time = 0.1325 cpu = 0.1331
 96 time = 0.1069 cpu = 0.1076     96 time = 0.2879 cpu = 0.8886
 97 time = 0.1192 cpu = 0.1198     97 time = 0.2867 cpu = 0.8986
 98 time = 0.1151 cpu = 0.1155     98 time = 0.3034 cpu = 0.8854
 99 time = 0.1200 cpu = 0.1207     99 time = 0.2867 cpu = 0.8966
100 time = 0.1146 cpu = 0.1153    100 time = 0.2901 cpu = 0.9018


import time
import timeit
import numpy as np
import platform

def correct_cpu(cpu_time):
    pv1, pv2, _ = platform.python_version_tuple()
    pcv = platform.python_compiler()
    if pv1 == '3' and '5' <= pv2 <= '8' and pcv =='Clang 6.0 (clang-600.0.57)':
        cpu_time /= 2.0
    return cpu_time

def test(func, n, name):
    print('\nTested %s :' % name)
    for i in range(90, 101):
        t = time.perf_counter()
        c = time.process_time()
        tm = func(i, n)
        t = time.perf_counter() - t
        c = correct_cpu(time.process_time() - c)
        st = t if tm <= 0.0 else tm
        print('%3d time = %.4f cpu = %.4f' % (i, st, c))
        if abs(t-st)/st > 0.02:
            print('    time!= %.4f' % t)

def test1(i, n):
    a, b = np.random.rand(i, i), np.random.rand(i)
    for _ in range(n):
        np.matmul(a, b)
    return 0.0

def test2(i, n):
    s = 'import numpy as np;' + \
        'a, b = np.random.rand({0},{0}), np.random.rand({0})'
    s = s.format(i)
    r = 'np.matmul(a, b)'
    t = timeit.repeat(stmt=r, setup=s, number=n)
    return sum(t)

def test3(i, n):
    s = 'import numpy as np;' + \
        'a, b = np.random.rand({0},{0}), np.random.rand({0})'
    s = s.format(i)
    r = 'np.matmul(a, b)'
    return timeit.timeit(stmt=r, setup=s, number=n)

print('Python version  :', platform.python_version())
print('       build    :', platform.python_build())
print('       compiler :', platform.python_compiler())
num = 10000
test(test1, 5 * num, 'by Python code only')
test(test2, num, 'with timeit.repeat')
test(test3, 5 * num, 'with timeit.timeit')


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM