简体   繁体   中英

Vector-Matrix-Multiplication is very slow in the OpenCV C++ interface

I have determined with the "Random-Stop-Method" that the following two lines appear to be very slow:

cv::Mat pixelSubMue = pixel - vecMatMue[kk_real];   // ca. 35.5 %
cv::Mat pixelTemp = pixelSubMue * covInvRef;        // ca. 58.1 %
cv::multiply(pixelSubMue, pixelTemp, pixelTemp);    // ca. 0 %
cv::Scalar sumScalar = cv::sum(pixelTemp);          // ca. 3.2 %

double cost = sumScalar.val[0] * 0.5 + vecLogTerm[kk_real]; // ca. 3.2 %
  • vecMatMue[kk_real] is a std::vector<cv::Mat> <- I know there is a lot of copying involved, but using pointers does not change a lot in performance here
  • pixelSubMue is a cv::Mat(1, 3, CV_64FC1) vector
  • covInvRef is a reference to a cv::Mat(3, 3, CV_64FC1) matrix
  • vecLogTerm[kk_real] is a std::vector<double>

The code snippet above is in an inner loop, that is called millions of times.

Question : Is there a way to improve the speed of that operation?

Edit : Thanks for the comments! I have now measured the time within the program and the percentages indicate how much of the time is spent on each line. The measurements were done in Release mode. I have done six measurements, each time the code was executed millions of times.

I should probably also mention, that the std::vector objects have no effect on the performance, I did just replace them with constant objects.

Edit 2 : I have also implemented the algorithm using the C-Api. The relevant lines look like this now:

cvSub(pixel, vecPMatMue[kk], pixelSubMue);                   // ca. 24.4 %
cvMatMulAdd(pixelSubMue, vecPMatFCovInv[kk], 0, pixelTemp);  // ca. 39.0 %
cvMul(pixelSubMue, pixelTemp, pixelSubMue);                  // ca. 22.0 %
CvScalar sumScalar = cvSum(pixelSubMue);                     // ca. 14.6 %
cost = sumScalar.val[0] * 0.5 + vecFLogTerm[kk];             // ca. 0.0 %

The C++ implementation needs for the same input data ca. 3100 msec while the C-Implementation needs only ca. 2050 msec (both measurements refer to the total time for executing the snippet millions of times). But I still prefer my C++ implementation, since it is easier to read for me (other "ugly" changes had to be made to make it work with the C-API).

Edit 3 : I have rewritten the code without using any function calls for the actual calculations:

capacity_t mue0 = meanRef.at<double>(0, 0);
capacity_t mue1 = meanRef.at<double>(0, 1);
capacity_t mue2 = meanRef.at<double>(0, 2);

capacity_t sigma00 = covInvRef.at<double>(0, 0);
capacity_t sigma01 = covInvRef.at<double>(0, 1);
capacity_t sigma02 = covInvRef.at<double>(0, 2);
capacity_t sigma11 = covInvRef.at<double>(1, 1);
capacity_t sigma12 = covInvRef.at<double>(1, 2);
capacity_t sigma22 = covInvRef.at<double>(2, 2);

mue0 = p0 - mue0; mue1 = p1 - mue1; mue2 = p2 - mue2;

capacity_t pt0 = mue0 * sigma00 + mue1 * sigma01 + mue2 * sigma02;
capacity_t pt1 = mue0 * sigma01 + mue1 * sigma11 + mue2 * sigma12;
capacity_t pt2 = mue0 * sigma02 + mue1 * sigma12 + mue2 * sigma22;

mue0 *= pt0; mue1 *= pt1; mue2 *= pt2;

capacity_t cost = (mue0 + mue1 + mue2) / 2.0 + vecLogTerm[kk_real];

Now the calculations for every pixel only need 150ms!

It looks like you're compiling Debug mode which probably explains the performance hit. You can profile your code using time functions such as clock() .

Eg

clock_t start,end;
...
start = clock();
cv::Mat pixelTemp = pixelSubMue * covInvRef;    // Very SLOW!
end = clock();

cout<<"Elapsed time in seconds: "<<(static_cast<double>(end)-start)/CLK_TCK<<endl;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM