C ++函数优化

Question

I have a function as follows, it is called many times, which makes my program run slowly. 我有一个如下函数，它被多次调用，这使我的程序运行缓慢。 Is there any way to optimize it? 有没有办法优化它？ For example, using SIMD instructions or other techniques. 例如，使用SIMD指令或其他技术。 The getray() function is a to retrieve a vector-3 given vector-2 query from a pre-computed look-up table. getray（）函数用于从预先计算的查找表中检索向量-3给定向量-2查询。 It is compiled in Visual-studio-2013 and the target configuration is x64 machine. 它在Visual-studio-2013中编译，目标配置是x64机器。

By the way, the for-loop which calls this function many times is already optimized by using OpenMP. 顺便说一句，使用OpenMP已经优化了多次调用此函数的for循环。

Thank you very much. 非常感谢你。

bool warpPlanarHomography(
const Eigen::Matrix3d& H_camera2_camera1
, const cv::Mat& image1
, const cv::Mat& image2
, FisheyeCameraUnified& cam1
, FisheyeCameraUnified& cam2
, const Eigen::Vector2i& patchCenter
, const int patchSize
, Eigen::Matrix<unsigned char, 7, 7>& patch1)
{
const int patchSize_2 = 3;
for (int v = 0; v < patchSize; ++v) // row
{
    for (int u = 0; u < patchSize; ++u)
    {
        Eigen::Vector2i p1 = Eigen::Vector2i(u - patchSize_2, v - patchSize_2).cast<int>() + patchCenter;

        if (p1(0, 0) < 0 || p1(1, 0) < 0 || p1(0, 0) >= image1.cols || p1(1, 0) >= image1.rows) return false;

        Eigen::Vector3d ray1;
        cam1.getRay(p1(1, 0), p1(0, 0), ray1);
        Eigen::Vector2d p2;
        if (!cam2.project(H_camera2_camera1 * ray1, p2))
        {
            return false;
        }
        if (p2.x() < 0.0 || p2.x() >= image2.cols - 1 ||
            p2.y() < 0.0 || p2.y() >= image2.rows - 1)
        {
            return false;
        }
        getInterpolatedPixel(image2, p2, &patch1(v, u));
    }
}
return true;
}

, where the project function looks like this ，项目功能看起来像这样

bool FisheyeCameraUnified::project(const Eigen::Vector3d& ray, Eigen::Vector2d& pt)
{
    double fx, fy, cx, cy, xi;
    fx = m_K(0, 0);
    fy = m_K(1, 1);
    cx = m_K(0, 2);
    cy = m_K(1, 2);
    xi = m_xi;

    double d = ray.norm();
    double rz = 1.0 / (ray(2) + xi * d);

    // Project the scene point to the normalized plane.
    Eigen::Vector2d m_d(ray(0) * rz, ray(1) * rz);

    // Apply the projection matrix.
    pt(0) = fx * m_d(0) + cx;
    pt(1) = fy * m_d(1) + cy;
    return true;
}

and getInterpolatedPixel() function as follows 和getInterpolatedPixel（）函数如下

void getInterpolatedPixel(const cv::Mat& image, const Eigen::Vector2d& coords, unsigned char* pixel)
{
    int ix = static_cast<int>(coords.x());
    int iy = static_cast<int>(coords.y());
    double dx = coords.x() - ix;
    double dy = coords.y() - iy;
    double dxdy = dx * dy;

    const double w00 = 1.0 - dx - dy + dxdy;
    const double w01 = dx - dxdy;
    const double w10 = dy - dxdy;
    const double w11 = dxdy;

    const unsigned char* p00 = image.data + iy * image.step.p[0] + ix * image.channels();
    const unsigned char* p01 = p00 + image.channels();
    const unsigned char* p10 = p00 + image.step.p[0];
    const unsigned char* p11 = p10 + image.channels();

    for (int i = 0; i < image.channels(); ++i)
    {
        double value = w11 * p11[i] + w10 * p10[i] + w01 * p01[i] + w00 * p00[i];
        pixel[i] = cv::saturate_cast<unsigned char>(value);
    }
}

Answer 1

measure where is bottleneck and try to optimize that place first 衡量瓶颈在哪里，并首先尝试优化该地点
can you use float instead of double ? 你可以使用float而不是double吗？
what are m_K(0, 0) , m_K(1, 1) ... can you replace it with constants 什么是m_K(0, 0) ， m_K(1, 1) ...可以用常量替换它
unroll for (int i = 0; i < image.channels(); ++i) loop if image can have only specific number of channels (1, 3, 4 are typical numbers) unroll for (int i = 0; i < image.channels(); ++i)循环，如果图像只能有特定数量的通道（ for (int i = 0; i < image.channels(); ++i)是典型数字）
call image.channels() only once and use stored value later 只调用一次image.channels()并稍后使用存储的值
try adding inline modifyer to small functions 尝试将inline修改器添加到小功能

Answer 2

This should be considered in addition to other, more broadly focused answers. 除了其他更广泛关注的答案之外，还应该考虑这一点。

Since getInterpolatedPixel is used in a tight loop, I focused there on reducing function calls: 由于getInterpolatedPixel用于紧密循环，因此我专注于减少函数调用：

void getInterpolatedPixel(const cv::Mat& image, const Eigen::Vector2d& coords, unsigned char* pixel)
{
    //save two function calls here
    double dx = coords.x();
    double dy = coords.y();
    int ix = static_cast<int>(dx);
    int iy = static_cast<int>(dy);
    dx -= ix;
    dy -= iy;
    //make this const
    const double dxdy = dx * dy;

    const double w00 = 1.0 - dx - dy + dxdy;
    const double w01 = dx - dxdy;
    const double w10 = dy - dxdy;
    const double w11 = dxdy;

    //cache image.channels()
    const int channels = image.channels();

    const unsigned char* p00 = image.data + iy * image.step.p[0] + ix * channels;
    const unsigned char* p01 = p00 + channels;
    const unsigned char* p10 = p00 + image.step.p[0];
    const unsigned char* p11 = p10 + channels;

    for (int i = 0; i < channels; ++i)
    {
        double value = w11 * p11[i] + w10 * p10[i] + w01 * p01[i] + w00 * p00[i];
        pixel[i] = cv::saturate_cast<unsigned char>(value);
    }
}

C ++函数优化

问题描述

2 个解决方案

解决方案1
3 2016-08-19 10:11:17

解决方案2
3 2016-08-19 10:46:18

C ++函数优化

问题描述

2 个解决方案

解决方案1 3 2016-08-19 10:11:17

解决方案2 3 2016-08-19 10:46:18

解决方案1
3 2016-08-19 10:11:17

解决方案2
3 2016-08-19 10:46:18