在大型仿真中改善Matlab功能

Question

I have a very big Matlab simulation project in my hands, which I wanted to optimize, since I'm running it many times to tune parameters and the like. 我手上有一个非常大的Matlab仿真项目，我想对其进行优化，因为我已经多次运行它来调整参数等。

Using Matlab's profile I identified one function that is eating up most of my time, specifically the line output(i,1)= max(mean(dens(i+1:a,1)),dens(i+1,1)); 使用Matlab的profile我确定了一个消耗大部分时间的函数，特别是行output(i,1)= max(mean(dens(i+1:a,1)),dens(i+1,1));

This function is called a LOT , where input is a 10x1 double passed as an argument, and output is also a 10x1 vector. 此函数称为LOT ，其中input是作为参数传递的10x1 double，而output也是10x1向量。

function output = my_function(input)

a = size(input,1);
output = input*0;
dens = density(input);

% for each i, output(i) is the maximum between output(i+1) and mean(output(i+1:end))
for i = 1:a-1
    output(i,1)= max(mean(dens(i+1:a,1)),dens(i+1,1));
end
output(a,1) = dens(a,1);

end

My ideas: 我的想法：

I think vectorization would maybe help to get rid of the loop (?), but I'm not familiar at all with the technique. 我认为向量化可能有助于摆脱循环（？），但我对这项技术并不熟悉。
Is there a faster/alternative way to calculate the mean (maybe without Matlab's built-in function call?) 有没有一种更快/替代的方法来计算mean （也许没有Matlab的内置函数调用？）

EDIT I tried to vectorize the function, and I got the following alternative result, which performs the same operations: 编辑我试图向量化函数，并且得到以下替代结果，该结果执行相同的操作：

function output = my_function_vectorized(input)

a = size(input,1);
rho_ref = zeros(size(input));
dens = density(input);

temp_cumsum = flip(cumsum(flip(dens))./(1:1:a)');
output = [max(temp_cumsum(2:end),dens(2:a));dens(a)];

end

I tried testing both function in the following way: 我尝试通过以下方式测试这两个功能：

Ts = random('unif',40,80,10,1000);
Results_original = zeros(size(Ts));
Results_vectorized = zeros(size(Ts));
TIMES_original = zeros(size(Ts,2),1);
TIMES_vectorized = zeros(size(Ts,2),1);

for ii = 1:size(Ts,2)
    tic;
    Results_original(:,ii) = my_function(Ts(:,ii));
    TIMES_original(ii) = toc;
end

for ii = 1:size(Ts,2)
    tic;
    Results_vectorized(:,ii) = my_function_vectorized(Ts(:,ii));
    TIMES_vectorized(ii) = toc;
end

res = norm(Res_1 - Res_2);
mTIMES_original = mean(TIMES_original);
mTIMES_vectorized = mean(TIMES_vectorized);

For which I get: 为此，我得到：

res =

   3.1815e-12

mTIMES_original/mTIMEZ_vectorized =

   3.0279

Should this residual be concerning to me? 这个残余对我来说应该吗？
Is it correct to say that I have fastened this computation by a factor of 3? 说我已将计算速度提高了三倍是正确的吗？

Answer 1

Vectorize it. 向量化它。

The re-read of dens is what is killing you, not the mean. 重新阅读窝点是杀死您的原因，不是故意的。 Mean is as optimized as Donald Knuth can make it. 平均值是Donald Knuth可以做到的。

I don't know your density function, so I can't be sure about my indexing. 我不知道您的密度函数，所以我不确定我的索引编制。

Pseudocode snips: 伪代码片段：

%(1)faster predeclaration that shows intent
output=zeroes(size(input))

%(2)vectorize your "mean between here and the end"
b = fliplr(fliplr(cumsum(dens(1:a-1)))./fliplr(1:a-1))

%(3)assemble your interior nX2 matrix 
c = [b,dens]

%(4)vectorized max, I think
output = max(c,[],2)

(1) it is hard to beat the built-ins for speed and efficiency. （1）很难击败内置的速度和效率。 It is also nice to be able to figure out a year from now what your code does. 能够从现在开始找出一年后的代码也很高兴。 Over time I find myself trying to be more and more of a literate programmer ( link ) because it is less time expensive in the long run than coming back in a year or ten and trying to reverse engineer my own work. 随着时间的流逝，我发现自己越来越想成为一名有文化的程序员（链接），因为从长远来看，这比花费一两年或十年的时间来尝试对自己的工作进行反向工程要花费的时间更少。

(2) the idea here is to flip the density vector around, then make a cumulative sum, then divide each element of the reversed cumulative sum by how many points fed into it, then flip it around again. （2）这里的想法是将密度向量翻转，然后求和，然后将反向求和的每个元素除以馈入点的数量，然后再次翻转。 When you divide that sum by the count - it becomes a mean. 当您将总和除以计数时-它成为平均值。 I just read the description (link) and there is an internal switch so you can restate this without the fliplr's and make it even more fast. 我只读了说明（链接），并且有一个内部开关，因此您可以不使用fliplr来重新声明它，并使它更快。

b = cumsum(dens(1:a-1),'reverse')./(a-1:-1:1) %this might work

(3) in theory when this is done you should have a matrix that is two columns wide, and has as many rows as "dens" does. （3）从理论上讲，完成此操作后，您应该拥有一个两列宽的矩阵，并且具有与“ dens”相同的行数。 resizing and predeclaring can be expensive - so if you are changing sizes often then you might want to pre-declare it like (1). 调整大小和预先声明可能会很昂贵-因此，如果您经常更改大小，则可能需要像（1）一样预先声明。

(4) the "max" function is going to be screaming fast too. （4）“最大”功能也将很快尖叫。 Not you nor Mr. Knuth are going to make it faster. 不是您还是Knuth先生都会使它变得更快。 I think that one compare (silicon op) for each element of the array and a few shuffles (less than one per element) are all that is required. 我认为，只需对数组的每个元素进行一次比较（硅操作），并进行一些改组（每个元素少于一个）即可。

This is an element-wise max. 这是元素级的最大值。 (I forgot to add the buffer in the middle). （我忘了在中间添加缓冲区）。 It is already made fast and its output is an array. 它已经快速完成，其输出是一个数组。 It may need a 1 instead of a 2, but you know what you are doing there and can figure that out. 它可能需要1而不是2，但是您知道自己在做什么并且可以弄清楚。

Let me know if that works for you. 让我知道这是否适合您。 I'm guessing it might give no more than 5x improvement. 我猜它可能不会带来超过5倍的改进。

I was stunned to find that LabVIEW can do some fundamentals 100x faster than MatLab because it is (always) compiled. 我惊讶地发现LabVIEW可以比MatLab快100倍地完成某些基础工作，因为它（总是）已编译。 When compiling in MatLab one must impose many new constraints on types and values, but in LV the compiling mostly pain-free because all of that constraining was part of the initial program creation. 在MatLab中进行编译时，必须对类型和值施加许多新的约束，但是在LV中进行编译几乎是无痛苦的，因为所有这些约束都是最初程序创建的一部分。 If you find the heart of your MatLab program isn't fast enough, you can make a wrapper for LV and run it much (much much) faster there with very little heartache. 如果您发现MatLab程序的核心速度不够快，则可以为LV做一个包装器，然后在运行时几乎没有心痛的情况下更快（很多）运行它。 LV doesn't do elaborate - there is a reason why we use text for books instead of pictures (or individualized renderings of the topic by da Vinci, as a more correct metaphor). LV并未详细说明-我们之所以使用书中的文字而非图片（或达芬奇的主题个性化渲染，作为更正确的隐喻）是有原因的。

EDIT: (about speed) 编辑：（关于速度）

It looks like you are ~3x faster. 看来您快了约3倍。

EDIT: (about code, note I'm using 2014a) 编辑：（关于代码，请注意我正在使用2014a）

clc; format short g;
a = 1:15
mu = fliplr(cumsum(fliplr(a))./(1:length(a)))

which gives: 这使：

a =

     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15


mu =

  Columns 1 through 9

            8          8.5            9          9.5           10         10.5           11         11.5           12

  Columns 10 through 15

         12.5           13         13.5           14         14.5           15

So I make "a", a vector starting at 1 and going to 15. The last value is 15. The average between the 2nd to the last value and the last is 14.5. 因此，我制作了一个“ a”，一个从1开始到15的向量。最后一个值为15。第二个到最后一个值与最后一个之间的平均值为14.5。 The average of the last 3 values is 14. The math seems to be working here. 最后3个值的平均值为14。数学似乎在这里起作用。

Edit: 编辑：

One great speedup was to switch off of the current java-based system. 一个很大的提速是关闭了当前基于Java的系统。 I have seen code get a large (better than 3x) speed boost by running in the version 2010a. 我已经看到，通过在2010a版本中运行，代码可以大大提高速度（优于3倍）。 Some code runs substantially slower when run through Java than when run through Fortran or C-based compiled libraries. 通过Java运行时，某些代码的运行要比通过Fortran或基于C的编译库运行的代码慢得多。

Answer 2

As has already been suggested, you can consider vectorizing your code; 正如已经建议的那样，您可以考虑对代码进行矢量化处理。 however, realistically, I'm not sure how much improvement that would really offer, in this case. 但是，实际上，在这种情况下，我不确定会真正提供多少改进。 Firstly, keep in mind that although in older versions of MATLAB for loops were generally considered very inefficient compared to vectorized approaches, due to the JIT accelerator in modern-day MATLAB, for loops aren't as big of an issue (performance wise) as they were several years ago. 首先，请记住，尽管在较旧版本的MATLAB中，与矢量化方法相比， for循环通常被认为效率很低，但是由于现代MATLAB中的JIT加速器， for循环并不像在性能上那样重要（在性能方面）他们是几年前。

Secondly, consider that if you have to jump through hoops to try to get your data into a form that can execute vectorized commands (which looks like it might be the case, here), then it might be a wash -- meaning that performance benefits of executing a vectorized command is outweighed by the time it takes to manipulate the data into the necessary vectorized form (and could potentially make your code thoroughly unreadible, open to potential bugs, and difficult to maintain). 其次，请考虑一下，如果您必须跳过所有步骤以尝试将数据转换为可以执行矢量化命令的形式（在这里看起来可能是这种情况），那么这可能会洗掉一面，这意味着性能受益将向量处理为必要的向量化形式所需的时间超过了执行向量化命令的时间（并且可能使您的代码完全不可读，易受潜在错误影响并且难以维护）。

That, of course, is not to say that vectorization won't at all be helpful in your case (the only real way to know is to give it a shot and profile it), but just realize the potential limitations. 当然，这并不是说矢量化对您的情况完全没有帮助（唯一真正的了解方法是对其进行拍摄和分析），而只是意识到潜在的局限性。

In addition to the suggestions made by EngrStudent, I would also suggest taking a look at the article Accelerating MATLAB Algorithms and Applications from the MathWorks. 除了EngrStudent提出的建议外，我还建议您看一下MathWorks上的文章“ 加速MATLAB算法和应用程序 ”。

In particular, two of the options described in this article jumped out at me as being potentially helpful in your case. 特别是，本文中介绍的两个选项对我有帮助，对您的情况可能有帮助。

The first is to convert your function to a MATLAB executable (MEX-function) . 首先是将您的函数转换为MATLAB可执行文件（MEX-function） 。 This is a fairly straightforward process that involves using MATLAB Coder to automatically generate C code from your function which can then be compiled as an executable MEX-function. 这是一个相当简单的过程，涉及使用MATLAB Coder从函数自动生成C代码，然后将其编译为可执行的MEX函数。 I suspect that this offers the greatest potential for performance improvement. 我怀疑这为提高性能提供了最大的潜力。 (And if you don't have the MATLAB Coder toolbox you could also consider manually writing a C code version of your function (or at least the time-intensive portion of it) and using this to produce a MEX function that you can use in MATLAB). （而且，如果您没有MATLAB Coder工具箱，也可以考虑手动编写函数的C代码版本（或至少是时间密集的部分），并使用它来生成可在其中使用的MEX函数。 MATLAB）。

The second would be to make use of parallel computing . 第二种是利用并行计算 。 For example, because each iteration of your for loop functions independently from one another, you could potentially replace this with a parallel for loop ( parfor ). 例如，由于for循环的每次迭代都彼此独立地起作用，因此您可以用并行的for循环（ parfor ）替换它。 Additionally, perhaps other parts of your overarching system or workflow could be parallelized. 此外，总体系统或工作流程的其他部分也许可以并行化。 This approach would obviously require access to the Parallel Computing Toolbox as well as a multi-core processor (or a cluster), so this might be of limited use to you... but if you have access to those resources, then this could be very beneficial for performance. 显然，这种方法需要访问Parallel Computing Toolbox以及多核处理器（或集群），因此这可能对您来说用途有限...但是，如果您可以访问这些资源，则可能是对性能非常有益。

在大型仿真中改善Matlab功能

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-12-05 18:51:24

解决方案2
2 2014-12-06 00:39:00

在大型仿真中改善Matlab功能

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-12-05 18:51:24

解决方案2 2 2014-12-06 00:39:00

解决方案1
2 已采纳 2014-12-05 18:51:24

解决方案2
2 2014-12-06 00:39:00