简体   繁体   中英

Apple Metal Matrix Multiplication Benchmark Results Inconsistent

I'm trying the Apple Metal matrix multiplication sample here: https://developer.apple.com/library/ios/samplecode/MetalPartialSumsCompute/Introduction/Intro.html

I get strange results: For tests [1]-[7], I get Metal running at around 0.05 GFlops. From tests [8]-[20], Metal starts to go really fast at around 500 GFlops. I attach the log below. I looked at the code and there's nothing different between tests, they are all random matrices of similar sizes. It looks like Metal at some point starts to go fast for no reason. Any ideas what's going on?

Log:

2016-06-30 16:13:29.609 MetalMatrixMultiplication-iOS[3459:742844] >> [1] Matrix Dimensions: A = [841 x 2012], B = [2012 x 554], C = [841 x 554], lda = 848, ldb = 560, ldc = 560
>> [1] Accelerate 6.934929 gflops/sec, Metal 0.044756 gflops/sec, Accelerate 27.034708 millisecs, Metal 4189.027417 millisecs, Diff 1.369554e-01

2016-06-30 16:13:31.747 MetalMatrixMultiplication-iOS[3459:742844] >> [2] Matrix Dimensions: A = [721 x 432], B = [432 x 1436], C = [721 x 1436], lda = 728, ldb = 1440, ldc = 1440
>> [2] Accelerate 1.405928 gflops/sec, Metal 0.045415 gflops/sec, Accelerate 63.626833 millisecs, Metal 1969.722500 millisecs, Diff 4.248900e-02

2016-06-30 16:13:34.820 MetalMatrixMultiplication-iOS[3459:742844] >> [3] Matrix Dimensions: A = [1362 x 457], B = [457 x 1078], C = [1362 x 1078], lda = 1368, ldb = 1080, ldc = 1080
>> [3] Accelerate 1.754547 gflops/sec, Metal 0.046793 gflops/sec, Accelerate 76.485125 millisecs, Metal 2867.863083 millisecs, Diff 3.673622e-02

2016-06-30 16:13:45.549 MetalMatrixMultiplication-iOS[3459:742844] >> [4] Matrix Dimensions: A = [1783 x 1901], B = [1901 x 1347], C = [1783 x 1347], lda = 1784, ldb = 1352, ldc = 1352
>> [4] Accelerate 6.528442 gflops/sec, Metal 0.091166 gflops/sec, Accelerate 139.869000 millisecs, Metal 10016.091333 millisecs, Diff 5.854867e-02

2016-06-30 16:13:48.912 MetalMatrixMultiplication-iOS[3459:742844] >> [5] Matrix Dimensions: A = [709 x 600], B = [600 x 1683], C = [709 x 1683], lda = 712, ldb = 1688, ldc = 1688
>> [5] Accelerate 2.629253 gflops/sec, Metal 0.045250 gflops/sec, Accelerate 54.460208 millisecs, Metal 3164.426333 millisecs, Diff 4.654048e-02

2016-06-30 16:13:57.534 MetalMatrixMultiplication-iOS[3459:742844] >> [6] Matrix Dimensions: A = [636 x 1573], B = [1573 x 1942], C = [636 x 1942], lda = 640, ldb = 1944, ldc = 1944
>> [6] Accelerate 7.106906 gflops/sec, Metal 0.047387 gflops/sec, Accelerate 54.674458 millisecs, Metal 8199.887292 millisecs, Diff 7.446345e-02

2016-06-30 16:14:10.669 MetalMatrixMultiplication-iOS[3459:742844] >> [7] Matrix Dimensions: A = [1803 x 1689], B = [1689 x 1950], C = [1803 x 1950], lda = 1808, ldb = 1952, ldc = 1952
>> [7] Accelerate 6.759199 gflops/sec, Metal 0.096267 gflops/sec, Accelerate 175.709292 millisecs, Metal 12337.145375 millisecs, Diff 4.568898e-02

2016-06-30 16:14:10.878 MetalMatrixMultiplication-iOS[3459:742844] >> [8] Matrix Dimensions: A = [416 x 749], B = [749 x 2034], C = [416 x 2034], lda = 416, ldb = 2040, ldc = 2040
>> [8] Accelerate 3.589321 gflops/sec, Metal 220.343105 gflops/sec, Accelerate 35.313750 millisecs, Metal 0.575250 millisecs, Diff 0.000000e+00

2016-06-30 16:14:11.003 MetalMatrixMultiplication-iOS[3459:742844] >> [9] Matrix Dimensions: A = [657 x 716], B = [716 x 734], C = [657 x 734], lda = 664, ldb = 736, ldc = 736
>> [9] Accelerate 2.946337 gflops/sec, Metal 102.394388 gflops/sec, Accelerate 23.438083 millisecs, Metal 0.674417 millisecs, Diff 0.000000e+00

2016-06-30 16:14:11.124 MetalMatrixMultiplication-iOS[3459:742844] >> [10] Matrix Dimensions: A = [446 x 945], B = [945 x 707], C = [446 x 707], lda = 448, ldb = 712, ldc = 712
>> [10] Accelerate 3.426099 gflops/sec, Metal 94.259957 gflops/sec, Accelerate 17.394667 millisecs, Metal 0.632250 millisecs, Diff 0.000000e+00

2016-06-30 16:14:11.533 MetalMatrixMultiplication-iOS[3459:742844] >> [11] Matrix Dimensions: A = [935 x 1286], B = [1286 x 1899], C = [935 x 1899], lda = 936, ldb = 1904, ldc = 1904
>> [11] Accelerate 6.185983 gflops/sec, Metal 441.997324 gflops/sec, Accelerate 73.824208 millisecs, Metal 1.033208 millisecs, Diff 0.000000e+00

2016-06-30 16:14:11.685 MetalMatrixMultiplication-iOS[3459:742844] >> [12] Matrix Dimensions: A = [541 x 956], B = [956 x 960], C = [541 x 960], lda = 544, ldb = 960, ldc = 960
>> [12] Accelerate 3.805037 gflops/sec, Metal 153.253113 gflops/sec, Accelerate 26.097417 millisecs, Metal 0.647958 millisecs, Diff 0.000000e+00

2016-06-30 16:14:12.007 MetalMatrixMultiplication-iOS[3459:742844] >> [13] Matrix Dimensions: A = [1278 x 1809], B = [1809 x 500], C = [1278 x 500], lda = 1280, ldb = 504, ldc = 504
>> [13] Accelerate 7.661287 gflops/sec, Metal 343.033372 gflops/sec, Accelerate 30.176417 millisecs, Metal 0.673958 millisecs, Diff 0.000000e+00

2016-06-30 16:14:12.456 MetalMatrixMultiplication-iOS[3459:742844] >> [14] Matrix Dimensions: A = [1933 x 1534], B = [1534 x 805], C = [1933 x 805], lda = 1936, ldb = 808, ldc = 808
>> [14] Accelerate 7.221810 gflops/sec, Metal 696.681127 gflops/sec, Accelerate 66.105417 millisecs, Metal 0.685250 millisecs, Diff 0.000000e+00

2016-06-30 16:14:12.552 MetalMatrixMultiplication-iOS[3459:742844] >> [15] Matrix Dimensions: A = [291 x 645], B = [645 x 1034], C = [291 x 1034], lda = 296, ldb = 1040, ldc = 1040
>> [15] Accelerate 2.155479 gflops/sec, Metal 62.162540 gflops/sec, Accelerate 18.007750 millisecs, Metal 0.624417 millisecs, Diff 0.000000e+00

2016-06-30 16:14:12.940 MetalMatrixMultiplication-iOS[3459:742844] >> [16] Matrix Dimensions: A = [1656 x 1547], B = [1547 x 781], C = [1656 x 781], lda = 1656, ldb = 784, ldc = 784
>> [16] Accelerate 7.341706 gflops/sec, Metal 424.495925 gflops/sec, Accelerate 54.504792 millisecs, Metal 0.942667 millisecs, Diff 0.000000e+00

2016-06-30 16:14:13.425 MetalMatrixMultiplication-iOS[3459:742844] >> [17] Matrix Dimensions: A = [1651 x 1320], B = [1320 x 1429], C = [1651 x 1429], lda = 1656, ldb = 1432, ldc = 1432
>> [17] Accelerate 6.615108 gflops/sec, Metal 1001.902932 gflops/sec, Accelerate 94.155625 millisecs, Metal 0.621667 millisecs, Diff 0.000000e+00

2016-06-30 16:14:13.757 MetalMatrixMultiplication-iOS[3459:742844] >> [18] Matrix Dimensions: A = [2037 x 384], B = [384 x 1615], C = [2037 x 1615], lda = 2040, ldb = 1616, ldc = 1616
>> [18] Accelerate 1.737157 gflops/sec, Metal 331.366545 gflops/sec, Accelerate 145.440583 millisecs, Metal 0.762458 millisecs, Diff 0.000000e+00

2016-06-30 16:14:13.923 MetalMatrixMultiplication-iOS[3459:742844] >> [19] Matrix Dimensions: A = [795 x 677], B = [677 x 1145], C = [795 x 1145], lda = 800, ldb = 1152, ldc = 1152
>> [19] Accelerate 3.405232 gflops/sec, Metal 192.017503 gflops/sec, Accelerate 36.194667 millisecs, Metal 0.641875 millisecs, Diff 0.000000e+00

2016-06-30 16:14:14.033 MetalMatrixMultiplication-iOS[3459:742844] >> [20] Matrix Dimensions: A = [1062 x 438], B = [438 x 678], C = [1062 x 678], lda = 1064, ldb = 680, ldc = 680
>> [20] Accelerate 2.090133 gflops/sec, Metal 98.388385 gflops/sec, Accelerate 30.177583 millisecs, Metal 0.641083 millisecs, Diff 0.000000e+00

What's happening is that the operation is failing, but the demo code doesn't check for the status and thus it appears as if it runs much faster.

If you add this block

if (m_CmdBuffer.status == MTLCommandBufferStatusError) {  
      NSLog(@"Error occured when executing command buffer");
      NSLog(@"Error code: %@", mCmdBuffer.error);
}

at the end of MetalMatrixMult finish method (MetalMatrixMult.mm line 513) you will see when the error happens.

It first fails with: Error code:

Error Domain=MTLCommandBufferErrorDomain Code=2 "Caused GPU Timeout Error (IOAF code 2)" UserInfo={NSLocalizedDescription=Caused GPU Timeout Error (IOAF code 2)}

then, after a couple of those it reports:

Error code: Error Domain=MTLCommandBufferErrorDomain Code=4 "Ignored (for causing prior/excessive GPU errors) (IOAF code 4)" UserInfo={NSLocalizedDescription=Ignored (for causing prior/excessive GPU errors) (IOAF code 4)}

Another thing I've noticed with Metal on iOS 9 is that there seem to be a memory management bug when GPU Frame Capture and Metal API Validations are turned on (Edit Scheme -> Options tab). It's as if metal buffers are not being deallocated when running in this mode.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM